NSF ABI Project: Deep Learning Methods for Protein Bioinformatics

[Research] [Software and Data] [Education] [Outreach] [People]

Research

Journal Publications

1. J. Hou, B. Adhikari, J. Cheng. DeepSF: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics, 34(8):1295–1303, 2018. [at Bioinformatics]

2. J. Hou, T. Wu, R. Cao, J. Cheng. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins, accepted. [at Proteins]

3. A. Al-Azzawi, A. Quadou, J. Cheng. Super Clustering Approach for Fully Automated Single Particle Picking in Cryo-EM. Genes, in press.

4. A. Al-Azzawi, A. Quadou, J.J. Tanner, J. Cheng. AutoCryoPicker: an unsupervised learning approach for fully automated single particle picking in cryo-EM images. BMC Bioinformatics, accepted.

5. T. Wu, J. Hou, B. Adhikari and J. Cheng. Elucidating key determinants of deep learning-based inter-residue contact distance prediction. Bioinformatics, 36(4), 1091-1098, 2020. 2019.

6. Yan, J., Cheng, J., Kurgan, L., Uversky, V. N. (2019). Structural and functional analysis of “non-smelly” proteins. Cellular and Molecular Life Sciences, 1-18.

7. Lensink, M. F., Brysbaert, G., Nadzirin, N., Velankar, S., Chaleil, R. A., Gerguri, T., ..., Kong, R. (2019). Blind prediction of homo‐and hetero‐protein complexes: The CASP13‐CAPRI experiment. Proteins: Structure, Function, and Bioinformatics, 87(12), 1200-1221.

8. Zhou, N., Jiang, Y., Bergquist, T. R., Lee, A. J., Kacsoh, B. Z., Crocker, A. W., ..., Davis, L. (2019). The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome biology, 20(1), 1-23.

9. Hou, J., Adhikari, B., Tanner, J. J., Cheng, J. (2020). SAXSDom: Modeling multidomain protein structures using small‐angle X‐ray scattering data. Proteins: Structure, Function, and Bioinformatics, 88(6), 775-787.

10. S. Dong, S.A. Moritz, J. Pfab, J. Hou, R. Cao, L. Wang, T. Wu, J. Cheng. Deep learning to predict protein backbone structure from high-resolution cryo-EM density maps. Scientific Reports, 10(1):1-22, 2020.

11. Chen, C., Hou, J., Tanner, J.J. and Cheng, J., 2020. Bioinformatics methods for mass spectrometry-based proteomics data analysis. International Journal of Molecular Sciences, 21(8), p.2873.

12. J. Hou, Z. Guo, and J. Cheng. DNSS2: improved ab initio protein secondary structure prediction using advanced deep learning architectures. BioRxiv, 2019.

13. Al-Azzawi, A., Ouadou, A., Max, H., Duan, Y., Tanner, J. J., & Cheng, J. DeepCryoPicker: fully automated deep neural network for single protein particle picking in cryo-EM. BMC bioinformatics, 21(1), 1-38, 20220.

14. Lawson, C. L., Kryshtafovych, A., Adams, P. D., Afonine, P., Baker, M. L., Barad, B. A., ... & Chojnowski, G.. Outcomes of the 2019 EMDataResource model challenge: validation of cryo-EM models at near-atomic resolution. Nature Methods, accepted, 2020.

15. Adil Al-Azzawi, Anes Ouadou, Ye Duan, and Jianlin Cheng. Auto3DCryoMap: An Automated Particle Alignment Approach for 3D cryo-EM Density Map Reconstruction. BMC Bioinformatics, 21(21), 1-26, 2020.

16. T. Wu, Z. Guo, J. Hou, J. Cheng. DeepDist: real-value inter-residue distance prediction with deep residual convolutional network. BMC Bioinformatics, 22:30, 2021.

17. M. Necci et al. Critical Assessment of Protein Intrinsic Disorder Prediction. Nature Methods, accepted, 2021.

18. C. Chen, T. Wu, Z. Guo, J. Cheng. Combination of deep neural network with attention mechanism enhances the explainability of protein contact prediction. Proteins, accepted, 2021.

19. Guo, Z, Wu, T., Liu, J., Hou, J., & Cheng, J. Improving deep learning-based protein distance prediction in CASP14. Bioinformatics, 2021.

20. Chen, X., Liu, J., Guo, Z., Wu, T., Hou, J., & Cheng, J. Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14. Scientific Reports, 2021.

21. Roy, R. S., Quadir, F., Soltanikazemi, E., & Cheng, J. (2022). A deep dilated convolutional residual network for predicting interchain contacts of protein homodimers. Bioinformatics, 38(7), 1904-1910.

22. Dhakal, A., McKay, C., Tanner, J. J., & Cheng, J. (2022). Artificial intelligence in the prediction of protein–ligand interactions: recent advances and future directions. Briefings in Bioinformatics, 23(1), bbab476.

23. Lensink, M. F., Brysbaert, G., Mauri, T., Nadzirin, N., Velankar, S., Chaleil, R. A., ... & Wodak, S. J. (2021). Prediction of protein assemblies, the next frontier: The CASP14‐CAPRI experiment. Proteins: Structure, Function, and Bioinformatics, 89(12), 1800-1823.

24. Quadir, F., Roy, R. S., Soltanikazemi, E., & Cheng, J. (2021). Deepcomplex: A web server of predicting protein complex structures by deep learning inter-chain contact prediction and distance-based modelling. Frontiers in Molecular Biosciences, 827.

25. Liu, J., Wu, T., Guo, Z., Hou, J., & Cheng, J. (2022). Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14. Proteins: Structure, Function, and Bioinformatics, 90(1), 58-72.

26. Wu, T., Liu, J., Guo, Z., Hou, J., & Cheng, J. (2021). MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction. Scientific reports, 11(1), 1-9.

27. Quadir, F., Roy, R. S., Halfmann, R., & Cheng, J. (2021). DNCON2_Inter: predicting interchain contacts for homodimeric and homomultimeric protein complexes using multiple sequence alignments of monomers and deep learning. Scientific reports, 11(1), 1-10.

28. Kryshtafovych, A., Moult, J., Billings, W. M., Della Corte, D., Fidelis, K., Kwon, S., ... & CASP‐COVID participants. (2021). Modeling SARS‐CoV‐2 proteins in the CASP‐commons experiment. Proteins: Structure, Function, and Bioinformatics, 89(12), 1987-1996.

29. Mahmud, S., Guo, Z., Quadir, F., Liu, J., & Cheng, J. (2022). Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps. BMC Bioinformatics, in press.

30. Guo, Z., Liu, J., Skolnick, J., Cheng, J. Prediction of inter-chain distance maps of protein complexes with 2D attention-based deep neural networks. Nature Communications, accepted.

31. Giri, N., Cheng, J. (2023) Improving protein-ligand interaction modeling with cryo-EM data, templates, and deep learning in 2021 ligand model challenge. Biomolecules, accepted.

32. Wu, T., Guo, Z., Cheng, J. (2023) Atomic protein structure refinement using all-atom graph representations and SE(3)-equivariant graph transformer. Bioinformatics, accepted.

33. Morehead, A., Chen, C., Sedova, A., Cheng, J. (2023) DIPS-Plus: The enhanced database of interacting protein structures for interface prediction. Scientific Data, accepted.

34. Liu, J., Guo, Z., Wu, T., Roy, R.S., Chen, C., Cheng, J. (2023) Improving AlphaFold2-based Protein Tertiary Structure Prediction with MULTICOM in CASP15. Communications Chemistry, accepted.

35. Guo, Z., Liu, J., Wang, Y., Chen, M., Wang, D., Xu, D., Cheng, J. (2023). Diffusion models in bioinformatics and computational biology. Nature Reviews Bioengineering, accepted.

36. Lensink et al. (2023). Impact of AlphaFold on Structure Prediction of Protein Complexes: The CASP15-CAPRI Experiment. Proteins, accepted.

37. Giri, N., R.S. Roy, and J. Cheng, Deep learning for reconstructing protein structures from cryo-EM density maps: Recent advances and future directions. Current Opinion in Structural Biology, 2023. 79: p. 102536.

Book Chapter

1. J. Hou, T. Wu, Z. Guo, F. Quadir, J. Cheng. The MULTICOM protein structure prediction server empowered by deep learning and contact distance prediction. Methods in Mol. Biol., , 2020.

BioRxiv Preprint

1. Hong, Y., Deng, Y., Cui, H., Segert, J., Cheng, J. (2020). Classifying protein structures into folds by convolutional neural networks, distance maps, and persistent homology. BioRxiv.

2. Morehead, A., Chen, X., Wu, T., Liu, J., & Cheng, J. (2022). EGR: Equivariant Graph Refinement and Assessment of 3D Protein Complex Structures. arXiv preprint arXiv:2205.10390

3. Liu, J., Guo, Z., Wu, T., Roy, R. S., Quadir, F., Chen, C., & Cheng, J. (2023). Enhancing AlphaFold-Multimer-based Protein Complex Structure Prediction with MULTICOM in CASP15. bioRxiv, 2023-05.

Conference Proceedings

1. A. Al-Azzawi, A. Quadou, J. Cheng. Super Clustering Approach for Fully Automated Single Particle Picking in Cryo-EM. International Conference on Intelligent Biology and Medicine (ICIBM), Columbus, OH, 2019.

2. Adil Al-Azzawi, Anes Ouadou, Ye Duan, and Jianlin Cheng. Auto3DCryoMap: An Automated Particle Alignment Approach for 3D cryo-EM Density Map Reconstruction. ICIBM 2020.

3. Morehead, A., Chen, C., & Cheng, J.* (2022). Geometric Transformers for Protein Interface Contact Prediction. The international conference on learning representation (ICLR), 2022.

4. Gao, M., Lund-Andersen, P., Morehead, A., Mahmud, S., Chen, C., Chen, X., ... Cheng, J., & Sedova, A. (2021, November). High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function. In 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) (pp. 46-57). IEEE.

5. Morehead, A., Cheng, J. Geometry-Complete Diffusion for 3D Molecule Generation. Machine Learning for Drug Discovery workshop of ICLR, accepted as poster paper, 2023.

6. Elham Soltanikazemi, Raj Roy, Farhan Quadir, Nabin Giri, Alex Morehead and Jianlin Cheng. DRLComplex: Reconstruction of Protein Quaternary Structures Using Deep Reinforcement Learning. The International Conference on Intelligent Biology and Medicine (ICIBM), Tempa, Florida, 2023.

7. Morehead, A., Cheng, J. Geometry-Complete Perceptron Networks for 3D Molecular Graphs. 2023 AAAI Workshop on Deep Learning on Graphs: Methods and Applications (DLG-AAAI). Washington DC, 2023.

Conference Abstracts, Posters, and Presentations

1. B. Adhikari, J. Hou, J. Cheng. Improved protein contact prediction using two-level deep convolutional neural networks. 26th Conference on Intelligent Systems for Molecular Biology, Chicago, IL, USA, 2018. (Talk)

2. J. Hou, B. Adhikari, and J. Cheng. DeepSF: deep convolutional neural network for mapping protein sequences to folds. In Proceedings of the 9th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, Boston, MA, USA, 2018. (Highlight Talk)

3. T. Wu, J. Hou, B. Adhikari, J. Cheng. Deep convolutional neural networks for improving protein contact prediction. 13th Critical Assessment of Techniques of Protein Structure Prediction (CASP13), Cancun, Mexico, 2018. (abstract)

4. J. Hou, T. Wu, J. Cheng. Improving protein tertiary structure prediction by deep learning, contact prediction, and domain recognition. 13th Critical Assessment of Techniques of Protein Structure Prediction (CASP13), Cancun, Mexico, 2018. (abstract)

5. J. Hou, T. Wu, R. Cao, J. Cheng. CASP13 tertiary structure prediction by the MULTICOM human group. 13th Critical Assessment of Techniques of Protein Structure Prediction (CASP13), Cancun, Mexico, 2018. (abstract and poster)

6. J. Hou, R. Cao, J. Cheng. CASP13 tertiary structure prediction by wfAll-Cheng of the WeFold collaborative. 13th Critical Assessment of Techniques of Protein Structure Prediction (CASP13), Cancun, Mexico, 2018. (abstract)

7. J. Cheng. Protein Structure Modeling Driven by Deep Learning and Contact Prediction. Invited Talk. The CASP13 Conference, Cancun, Mexico, Dec. 1-4, 2018. (Inivted Talk)

8. J. Cheng. Deep Learning for Protein Contact/Distance Prediction, Roundtable discussion, The CASP13 Conference, Cancun, Mexico, Dec. 1-4, 2018. (Round table discussion and presentation)

9. J. Cheng. Protein Structure Modeling Driven by Deep Learning and Contact Distance Prediction. Invited Talk. Workshop on Health Informatics, IEEE Conference on Big Data, 2018. (Invited talk)

10. J. Cheng. Distance-based ab initio protein structure prediction driven by deep learning. The workshop on deep learning algorithms and applications, Copenhagen, Denmark, 2019. (Invited talk)

11. J. Cheng. Protein Tertiary Structure Modeling Driven by Deep Learning and Contact Distance Prediction in CASP13. ACM Conference on Bioinformatics, Computational Biology and Health Informatics (ACM-BCB), Niagara Falls, NY, 2019. (abstract)

12. J. Cheng. Introduction to 2019 ACM-BCB Highlights Session. ACM Conference on Bioinformatics, Computational Biology and Health Informatics (ACM-BCB), Niagara Falls, NY, 2019. (abstract)

13. Deep Learning Prediction of Protein Contact and Distance, International Society for Computational Biology (ISCB), 2020. (talk)

14. Protein Tertiary Structure Modeling Driven by Deep Learning and Contact Distance Prediction. University of Alabama, 2020. (talk).

15. Protein Tertiary Structure Modeling Driven by Deep Learning and Contact Distance Prediction in CASP13. ACM-BCB Conference, Niagara Fall, New York, 2019. (talk).

16. Deep Learning for Protein Structure Prediction, Stowers Institute, Kansas City, 2019.

17. J. Liu, J. Hou, T. Wu, Z. Guo, J. Cheng. CASP14 Tertiary Structure Prediction by MULTICOM Human Group. The 14th Critical Assessment of Techniques of Protein Structure Prediction (CASP14), Virtual Meeting, 2020.

18. Tianqi Wu, Jian Liu, Zhiye Guo, Jie Hou, J. Cheng. CASP14 Protein Tertiary Structure Prediction by MULTICOM Server Predictors. The 14th Critical Assessment of Techniques of Protein Structure Prediction (CASP14), Virtual Meeting, 2020.

19. Zhiye Guo, Tianqi Wu, Jian Liu, Jie Hou, Jianlin Cheng. Prediction of protein inter-residue distance and contacts with deep learning. The 14th Critical Assessment of Techniques of Protein Structure Prediction (CASP14), Virtual Meeting, 2020.

20. Jian Liu, Zhiye Guo, Xiao Chen, Alex Morehead, Raj S. Roy, Tianqi Wu, Nabin Giri, Farhan Quadir, Chen Chen, Jianlin Cheng. Improving Multimer Structure Prediction by Sensitive Alignment Sampling, Template Identification, Model Ranking, Iterative Refinement in CASP15-CAPRI. CASP15-CAPRI conference, 2022.

21. Xiao Chen†, Alex Morehead†, Raj S. Roy†, Zhiye Guo, Jian Liu, Nabin Giri, Tianqi Wu, Chen Chen, Jianlin Cheng. Multimer Model Scoring Based on Gated-Graph Transformer and Steerable Equivariant Graph Neural Networks in CASP15-CAPRI. CASP15-CAPRI conference, 2022.

22. Xiao Chen†, Alex Morehead†, Raj S. Roy†, Zhiye Guo, Jian Liu, Nabin Giri, Tianqi Wu, Chen Chen, Jianlin Cheng. Multimer Model Quality Assessment Using Gated-Graph Transformer, Steerable Equivariant Graph Neural Networks, and Pairwise Model Similarity . CASP15 conference, 2022.

23. Jian Liu, Zhiye Guo, Tianqi Wu, Raj S. Roy, Farhan Quadir, Chen Chen, Jianlin Cheng. Improving Assembly Structure Prediction by Sensitive Alignment Sampling, Template Identification, Model Ranking, and Iterative Refinement. CASP15 conference, 2022.

24. Nabin Giri, Ashwin Dhakal, Jian Liu, Jianlin Cheng. Template-based Modeling for Accurate Prediction of Ligand-Protein Complex Structures in CASP15. CASP15 Conference, 2022.

25. Jian Liu, Zhiye Guo, Tianqi Wu, Raj S. Roy, Farhan Quadir, Chen Chen, Jianlin Cheng. Improving Tertiary Structure Prediction by Alignment Sampling, Template Identification, Model Ranking, Iterative Refinement, and Protein Interaction-Aware Modeling. CASP15 conference, 2022.

26. Jianlin Cheng. Combining Pairwise Similarity and Interface Contact Prediction to Evaluate Protein Assembly Models, The 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15), Antalya, Turkey. 2022. (Talk)

27. Jianlin Cheng. Deep Learning for Bioinformatics, DeepLearn Winter School, Sweden. 2022. (Talk)

28. Jianlin Cheng. Deep Learning Techniques for Protein Structure Refinement and Evaluation, Southern California AI Symposium, University of California, Irvine. 2022.

29. Jianlin Cheng. Deep Learning Techniques for Protein Structure Refinement and Evaluation, Stockholm University, Sweden. 2022.

30. Alex Morehead: Geometry-Complete Perceptron Networks for 3D Molecular Graphs. AAAI Workshop on Deep Learning on Graphs: Methods and Applications (DLG-AAAI). Washington DC. 2023.

31. Alex Morehead: Geometry-Complete Diffusion for 3D Molecule Generation. Machine Learning for Drug Discovery workshop of ICLR. 2023.

32. Jianlin Cheng. Deep learning protein structure prediction in CASP15, KAUST, Sandi Arabia. 2023.

Theses & Dissertations

1. Y. Hong. PRO3DCNN: Convolutional Neural Networks for Mapping Protein Structure into Folds. Master’s Thesis. University of Missouri – Columbia, 2019.

2. J. Hou. Improving Protein Structure Prediction by Deep Learning and Computational Optimization. PhD Dissertation. University of Missouri – Columbia, 2019.

3. A. Al-azzawi. Fully Automated Deep Supervised and Unsupervised Learning Approaches for 3D protein Cryo-EM Density Map Reconstruction. PhD Dissertation. University of Missouri – Columbia, 2019.

4. Xiangyu Li. CONFOLD New Version: Contact-Guided Ab Initio Protein Folding with New Features. Master’s Thesis. University of Missouri – Columbia, 2019.

5. Tianqi Wu. Protein tertiary structure prediction and refinement using deep learning. PhD Dissertation. University of Missouri - Columbia, 2022.

6. Zhiye Guo. PROTEIN CONTACT DISTANCE AND STRUCTURE PREDICTION DRIVEN BY DEEP LEARNING. PhD Dissertation. University of Missouri - Columbia, 2023.

Software and Data

1. DNSS2: Deep learning tools for protein secondary structure prediction at GitHub

2. MULTICOM: the open source comprehensive protein structure prediction system at GitHub

3. DeepSF: deep convolutional neural networks for mapping protein sequences to folds at GitHub

4. DNCON2: deep learning prediction of protein residue-residue contacts

5. PRO3DCNN: convolutional neural networks for mapping protein structures to folds

6. MULTICOM protein structure prediction server with model quality assessment service

7. SAXSDom: protein domain assembly using SAXS data

8. DNCON4: protein contact map prediction

9. DeepCryoPicker: deep learning picking of protein particles in cryo-EM images

10. DeepDist: deep learning protein distance prediction

11. DFold: distance-based protein structure modeling with simulated annealing

12. Auto3DCryoMap: automatically reconstructing 3D protein density maps from cryo-EM protein particle images

13. GFOLD: gradient descent-based modeling of protein structure

14. MULTICOM2: the second version of MULTICOM protein structure prediction system

15. Deep learning interpretation of protein contact prediction and folding

16. Deep geometric transformer for predicting protein interface contacts

17. Deep dilated convolutional residual netowrk for predicting inter-protein contacts

18. Deep attention-based network for predicting inter-protein distance

19. MULTICOM3 protein structure prediction system

20. Protein domain prediction with deep learning and distance maps

21. Geometry-complex perceptron networks for 3D molecule generation

22. Geomery Complete Diffusion Model for 3D molecule generation

23. Graph transformer for all-atom refinement of protein structures

Education

Courses and Seminar

1. Machine Learning for Bioinformatics

2. Computational Modeling of Molecular Structures

3. University of Missouri Deep Learning Seminar

The community-wide Critical Assessment of Techniques for Protein Structure Prediction(CASP)

1. Our MULTICOM method was ranked 3rd in the 13th CASP competition (CASP13) in 2018. [Official Ranking]

2. Our MULTICOM method was ranked 3rd in inter-domain protein structure prediction and 7th in protein tertiary structure prediction in 14th CASP competition (CASp14) in 2020. [Official Ranking]

3. Our MULTICOM methods were ranked No. 1 in estimating fold accuracy of protein quaternary structures, No.3 among server predictors of predicting protein quaternary structures, No. 3 among server predictors of predicting protein tertiary structures, and No. 7 among both human and server predictors of predicting both tertiary and quaternary structures in the 15th Community-Wide Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022.

Outreach

1. MULTICOM was ranked among top 3 in the world-wide protein tertiary structure prediction (CASP13) (News)

2. Dr. Cheng gave a Ted-like talk on deep learning's application to protein folding for general public in the Research Day Conference, College of Engineering, University of Missouri - Columbia ((News)

3. MULTICOM was ranked among top in the world-wide protein tertiary structure prediction (CASP14) News)

4. MULTICOM was ranked among top in CASP15 Mizzou News

5. MULTICOM was ranked among top in CASP15 Mizzou Engineering News

People

Principle Investigator

Dr. Jianlin Cheng

Graduate Students

Jie Hou, Tianqi Wu, Zhiye Guo, Yechan Hong, Farhan Quadir, Carlos Martinez Villar, Chen Chen, Adil Al-azzawi, Xiangyu Li, Alex Morehead, Jian Liu, Raj Roy, Sajid Mahmud.

Undergraduate Students

Yongyu Deng, Haofan Cui, Royal Sanders, Nolan Park

Contact

Prof. Jianlin Jack Cheng

Director of Bioinformatics and Machine Learning Laboratory (BML)

Professor
Department of Electrical Engineering and Computer Science
Informatics Institute
College of Engineering
University of Missouri, Columbia, MO 65211-2060

Primary Office: EBW 109
Lab: E1425 Lafferre Hall and 250 Naka Hall
Phone: 573-882-7306
Fax: 573-882-8318
Email: chengji@missouri.edu