Home
Publications
Research
Group
Courses
Software
Services
Positions
News


Bioinformatics, Data Mining, Machine Learning Laboratory (BDM)

Overview:

The research in my lab focuses on developing machine learning and data mining methods to analyze big biomedical data and address fundamental problems in biomedical sciences. Currently, we are developing  bioinformatics algorithms and tools for protein structure and function prediction, 3D genomics, biological network modeling, and omics data analysis. We have active projects in protein structure and function prediction, 3D genome structure modeling, inference and simulation of biological networks and systems (gene regulatory networks, metabolic networks, signal transduction networks, protein-protein interaction networks, and gene-gene interaction networks), protein interaction and docking, biological sequence alignments, transcriptomics (RNAseq and microarray gene expression data analysis), genomics, epigenomics, and proteomics. These projects are being funded by the National Institutes of Health (NIH), the National Science Foundation (NSF), and the Department of Energy (DOE).

The main techniques that we are developing include deep learning, computational optimization methods, neural networks, support vector machines, random forests, hidden Markov models, Markov chain Monte Carlo methods, graphical models, Bayesian networks, kernel methods, clustering methods, graph algorithms, dynamic programming, differential equations, information theory, data mining methods, (Bayesian) statistical methods, artificial intelligence, and high performance computing (cloud computing and GPU). The bioinformatics tools, web services, and datasets produced by our research are freely available. Our automated tools - the MULTICOM suite - for the prediction of protein structure and structural features were ranked among the best methods in the last four community-wide biennial Critical Assessment of Techniques for Protein Structure Prediction (CASP7, 8, 9, 10, 11), in 2006, 2008, 2010, 2012 and 2014, respectively (e.g., the official CASP11 Results). Our protein function prediction method (MULTICOM-PDCN) was ranked among the best methods during 2010-2011 Critical Assessment of Protein Function Annotation (CAFA).

A brief presentation of the research in the Bioinformatics, Data Mining and Machine Learning (BDM) Lab

The citations to our research papers according to Google Scholar

Highlights:

In 2014 our MULTICOM protein structure prediction system was ranked among the best methods in protein tertiary structure modeling during the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11). (Official ranking).

In 2012 our MULTICOM protein structure prediction system was ranked among the best methods in protein tertiary structure modeling, protein model refinement, protein model quality assessment, and protein contact map prediction during the 10th Critical Assessment of Techniques for Protein Structure Prediction (CASP10).

From 2010 to 2011, our protein function predition tool MULTICOM-PDCN was ranked among top methods during the Critical Assessment of Protein Function Prediction (CAFA)

In 2010 our MULTICOM protein structure prediction system was ranked among the best methods in template-based structure modeling, template-free structure modeling, protein model quality assessment, protein disorder region prediction, and protein contact map prediction during the 9th Critical Assessment of Techniques for Protein Structure Prediction (CASP9).

In 2008 our MULTICOM protein structure prediction methods were ranked among the best methods in template-based structure modeling, template-free structure modeling, protein model quality assessment, protein disorder region prediction, protein domain boundary prediction, and protein contact map prediction in CASP8, 2008. Dr. Jianlin Cheng was invited to give four talks during the CASP8 meeting, Cagliari, the island of Sardinia, Italy, Dec 3-7, 2008. [CASP8 template_free modeling talk]; [CASP8 template-based modeling talk]; [CASP8 model quality assessment talk]; [CASP8 disorder prediction talk].