Title: Computational Modeling of Molecular Structures - one of Big Bio Data Mining and Modeling Courses (CMP SC 8170)

Location: Naka Hall 353

Time: Mon & Wed, 4:00 pm - 5:15 pm, 2019 Spring Semester

Instructors: Prof. Jianlin Cheng

Office hours: Wed 3:30 - 4:00, Naka Hall 109

Acknowledgement: The course development is supported by a National Science Foundation CAREER award. Some images and figures used in the lectures are provided by images.google.com and other sources.

Syllabus

Lectures

1. Introduction [PPT slides]

2. Principles, Algorithms and Data Strucutres for Template-Based Protein Structure Modeling [PDF slides][PPT slides]

Homework: read one of the two articles and write a half page summary (12-point font): A. Sali, T.L. Blundell. Comparative Protein Modeling by Satisfaction of Spatial Restraints. JMB, 1993. Or J. Li, J. Cheng. A Stochastic Point Cloud Sampling Method for Multi-Template Protein Comparative Modeling. Scientific Reports, 2016 Email your assignment to mumachinelearning@gmail.com.

3. Principles, Algorithms, and Data Structures for Template-Free Protein Structure Modeling [PDF slides] and [PPT slides]

Homework: read one of the five articles and write a half page summary: (1)Adhkari B, Hou J, Cheng J. DNCON2: Improved protein contact prediction using two-level deep convolutional neural networks, (2) Greener et al. DMPfold: fast de novo protein model generation from covarying sequences using predicted distances and iterative model building, (3)Hou et al. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13, (3) Wang et al. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model, (5)Xu J. Distance-based Protein Folding Powered by Deep Learning. (Due on March 2nd.)

4. Principles, Algorithms and Data Structures for Protein Docking [PDF slides] [PPT slides]

Homework: read one of the following two articles and write a half page summary. M.F. Lensink et al. Prediction of homo- and hetero-protein complexes by ab-initio and template-based docking: a CASP-CAPRI experiment. Proteins, accepted, 2016. and D. Ritchie. Recent progress and future directions in protein-protein docking. Current Protein and Peptide Science, 2008. (Due on April 10).

5. Principles, Algorithms, and Data Structures for Genome Structure Modeling [PDF slides][PPT slides]

Homework: read one of the three following articles and write a half page summary: E. Lieberman-Aiden et al. Comprehensive mapping of long-range interactions reveal folding principles of the human genome. Science, 2009; Z. Wang, R. Cao, K. Taylor, A. Briley, C. Caldwell, J. Cheng. The properties of human genome conformation and spatial gene interaction and regulation networks. PLOS ONE. 8(3):e58793.T. Trieu, J. Cheng. Large-scale reconstruction of 3D structures of human chromosomes from chromosomal contact data. Nucleic Acids Research, 2014. (Due on April 25, 2019; Wednesday).

Projects

This course is focused on developing and benchmarking 3D molecular structure modeling tools and algorithms in order for students to study common principles, representations, sampling algorithms and software engineering in molecular modeling. Students are required to make, implement and assess their plans of developing and testing the following software tools by doing a series of group projects. The discussion and presentation of project plan, implementation and results account for about 2/3 - 3/4 course time, while the lectures use the other 1/4 - 1/3 course time.

Develop a prototype of template-based protein modeling tool

Develop a prototype of template-free protein modeling tool

Benchmark three protein docking tools

Apply a prototype of genome structure modeling tool using a Markov Chain Monte Carlo approach to two data sets (Hi-C contact data of Chr. 7 in Wang et al. (The normalized contact matrix derived from the Hi-C contact data used in Tuan and Cheng, Nucleic Acids Research, 2014), 5C data in Rausseau. (Ref: Rousseau et al. Three-dimensional modeling of chromatin structure from interaction frequency data using Markov chain Monte Carlo sampling. BMC Bioinformatics, 2011. Data sets: (1) Rousseau et al. 2001 (a gene loci); (2) Liberman-Aiden et al. (a chromosome), 2009; (3) Wang et al., 2013 (a chromosome)