Title: Computational Modeling of Molecular Structures - one of Big Bio Data Mining and Modeling Courses (CMP SC 8170)

Location: Naka Hall 353

Time: Mon & Wed, 4:00 pm - 5:15 pm, 2019 Spring Semester

Instructors: Prof. Jianlin Cheng

Office hours: Wed 3:30 - 4:00, Naka Hall 109

Acknowledgement: The course development is supported by a National Science Foundation CAREER award. Some images and figures used in the lectures are provided by images.google.com and other sources.

Syllabus

Lectures

1. Introduction [PPT slides]

2. Principles, Algorithms and Data Strucutres for Template-Based Protein Structure Modeling [PDF slides][PPT slides]

Homework: read one of the two articles and write a half page summary (12-point font): A. Sali, T.L. Blundell. Comparative Protein Modeling by Satisfaction of Spatial Restraints. JMB, 1993. Or J. Li, J. Cheng. A Stochastic Point Cloud Sampling Method for Multi-Template Protein Comparative Modeling. Scientific Reports, 2016 Email your assignment to mumachinelearning@gmail.com.

3. Principles, Algorithms, and Data Structures for Template-Free Protein Structure Modeling [PDF slides] and [PPT slides]

Homework: read one of the five articles and write a half page summary: (1)Adhkari B, Hou J, Cheng J. DNCON2: Improved protein contact prediction using two-level deep convolutional neural networks, (2) Greener et al. DMPfold: fast de novo protein model generation from covarying sequences using predicted distances and iterative model building, (3)Hou et al. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13, (3) Wang et al. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model, (5)Xu J. Distance-based Protein Folding Powered by Deep Learning. (Due on March 2nd.)

4. Principles, Algorithms and Data Structures for Protein Docking [PDF slides] [PPT slides]

Homework: read one of the following two articles and write a half page summary. M.F. Lensink et al. Prediction of homo- and hetero-protein complexes by ab-initio and template-based docking: a CASP-CAPRI experiment. Proteins, accepted, 2016. and D. Ritchie. Recent progress and future directions in protein-protein docking. Current Protein and Peptide Science, 2008. (Due on April 10).

5. Principles, Algorithms, and Data Structures for Genome Structure Modeling [PDF slides][PPT slides]

Homework: read one of the three following articles and write a half page summary: E. Lieberman-Aiden et al. Comprehensive mapping of long-range interactions reveal folding principles of the human genome. Science, 2009; Z. Wang, R. Cao, K. Taylor, A. Briley, C. Caldwell, J. Cheng. The properties of human genome conformation and spatial gene interaction and regulation networks. PLOS ONE. 8(3):e58793.T. Trieu, J. Cheng. Large-scale reconstruction of 3D structures of human chromosomes from chromosomal contact data. Nucleic Acids Research, 2014. (Due on April 25, 2019; Wednesday).

Projects

This course is focused on developing and benchmarking 3D molecular structure modeling tools and algorithms in order for students to study common principles, representations, sampling algorithms and software engineering in molecular modeling. Students are required to make, implement and assess their plans of developing and testing the following software tools by doing a series of group projects. The discussion and presentation of project plan, implementation and results account for about 2/3 - 3/4 course time, while the lectures use the other 1/4 - 1/3 course time.

  • Develop a prototype of template-based protein modeling tool
  • Develop a prototype of template-free protein modeling tool
  • Benchmark three protein docking tools
  • Apply a prototype of genome structure modeling tool using a Markov Chain Monte Carlo approach to two data sets (Hi-C contact data of Chr. 7 in Wang et al. (The normalized contact matrix derived from the Hi-C contact data used in Tuan and Cheng, Nucleic Acids Research, 2014), 5C data in Rausseau. (Ref: Rousseau et al. Three-dimensional modeling of chromatin structure from interaction frequency data using Markov chain Monte Carlo sampling. BMC Bioinformatics, 2011. Data sets: (1) Rousseau et al. 2001 (a gene loci); (2) Liberman-Aiden et al. (a chromosome), 2009; (3) Wang et al., 2013 (a chromosome)

  • Dostie Lab's HoxA gene cluser 5C data to run MCMC5c program: Set 1; Set 2. The data don't include count numbers, which are not necessary for running the MCMC5C program.

    Project Movies and Galleries

  • Protein fragment assembly movie by group 1 (2013)
  • Protein fragment assembly movie by group 3 (2013)
  • The video of modeling a 3D human chromosome structure using a contact-driven gradient descent method (2013).
  • A nice resource of video lectures about various topics. (to post lectures there)
  • Data

    CASP Benchmarks
    CAPRI Benchmark
    CAPRI Experimental Structures
    Protein Data Bank(PDB), Gene Ontology, UniProt, Pfam

    External Tools

    Analytical Loop Closure Program
    MULTICOM toolbox
    SCWRL for packing side chains given the backbone conformation of a protein
    Modeller (for details about how to use Modeller, check out Structural Bioinformatics course)
    Jackal
    I-TASSER
    Rosetta
    HHSearch
    IMP (for details about how to use IMP, check out Integrative Bioinformatics course)
    MESHI
    Flexible Docking
    FoldIt - a protein folding game
    PyMol - a powerful protein structure visualization tool
    JMol
    Crankite: conversion between dihedral angles and coordinates

    Journals

  • Bioinformatics
  • BMC Bioinformatics
  • Proteins
  • Nucleic Acids Research
  • RNA
  • BioDataMining
  • Related Courses taught by Prof. Jianlin Cheng

  • Supervised Machine Learning
  • Computational Modeling of Molecular Structures
  • Data Mining and Knowledge Discovery
  • Machine Learning for Bioinformatics
  • Problem Solving in Bioinformatics
  • Computational Optimization Mehtods
  •