Home
Publications
Research
Group
Courses
Software
Services
Positions
News


Bioinformatics and Machine Learning Laboratory (BML)

Overview:

The research in my lab focuses on developing machine learning, deep learning, and artificial intelligence (AI) methods to analyze big biological data and address fundamental problems in biological and medical sciences. Currently, we are developing  bioinformatics algorithms and tools for protein structure, interaction and function prediction, 3D genomics, biological network modeling, and omics data analysis. We have active projects in protein structure, interaction and function prediction, 3D genome structure modeling, inference and simulation of biological networks and systems, and omics (transcriptomics, genomics, epigenomics, and proteomics) data analysis (e.g., RNAseq data analysis). These projects are being funded by the National Institutes of Health (NIH), the National Science Foundation (NSF), and the Department of Energy (DOE).

The main techniques that we are developing include deep learning, artificial intelligence (AI), machine learning, data mining, optimization, and high performance computing (cloud computing and GPU). Our bioinformatics tools, web services, and datasets are freely available. Our MULTICOM suite for the prediction of protein structure and structural features were ranked among the best methods in the last several community-wide biennial Critical Assessments of Techniques for Protein Structure Prediction (CASP7, 8, 9, 10, 11, 12, 13, 14, and 15) in 2006, 2008, 2010, 2012, 2014, 2016, 2018, 2020, and 2022), respectively.

The citations to our research papers according to Google Scholar

Highlights:

In 2022 our MULTICOM methods were ranked No. 1 in estimating fold accuracy of protein quaternary structures (see the SCORE ranking), No.3 among server predictors of predicting protein quaternary structures, No. 3 among server predictors of predicting protein tertiary structures, and No. 7 among both human and server predictors of predicting both tertiary and quaternary structures in the 15th Community-Wide Critical Assessment of Techniques for Protein Structure Prediction (CASP15).

In 2020 our MULTICOM protein structure prediction system was ranked among top methods out of more than 130 predictors during the 14th Critical Assessment of Techniques for Protein Structure Prediction (CASP14). (the official CASP14 inter-domain protein prediction results, the official CASP14 tertiary structure prediction results). It also performed best in selecting the best protein structural models in terms of GDT-TS score. (Note: AlphaFold is Google DeepMind's AlphaFold2)

In 2018 our MULTICOM protein structure prediction system was ranked among top three out of 99 predictors in protein tertiary structure modeling during the 13th Critical Assessment of Techniques for Protein Structure Prediction (CASP13). (Official ranking). (Note: A7D is Google DeepMind's AlphaFold1)