CMP SC 8370: Data Mining and Knowledge Discovery

Instructor: Dr. Jianlin Cheng

Location: Engineering Building West 353, Time: MoWe 4:00 pm - 5:15 pm, Office Hours: MoWe 2:30 pm - 3:30 pm, Semester: Spring 2016

Syllabus

Lecture Slides

Acknowledgements: these slides are largely customized and adapted from the text book's slides.

1. Data Mining Concepts and Process

2. Data Preprocessing

3. Frequent Pattern Mining

4. Classification and Prediction

5. Cluster Analysis

6. Network Mining

Text Book

Han and Kamber. Data Mining: Concepts and Techniques . Morgan Kaufman. 

Reading Materials and Other Resources

1. A portal web site of the data mining community (news, tools, data, jobs, trends)
2. Chapters of the text book covered in the class (self-reading, not graded)
3. R Statistics Computing Software
4. Weka open source data mining software
5. The vote data set used to demo both classification and clustering with Weka
6. RapidMiner open source data mining software
7. Cloud computing
8. Data Mining Theory
9. Data Science Portal
10. LinkedIn Business Data Mining Group

Assignments

All the assignments should be submitted to mudatamining@gmail.com.
Assignment 1 (15 points), due by the end of Feb 8.
Assignment 2 (20 points), due by Feb. 18. (Here are a couple of examples about how to use R to draw plots, which may be useful)
Assignment 3 (35 points), due by March 1.
Assignment 4 (10 points, due March 15) (your group members' names are required. Other information about the project you choose is optional. Each group has up to four students. Only one member of a group should send the information to mudatamining@gmail.com on behalf of all members.)

Projects

The description of project 1 - customer relation prediction
The description of project 2 - new customer recognition
The description of project 3 - internet query classification
Project 4 - image data mining, Face V.S. other object recognition
Project 5 - social network mining

The final report of your project is due on May 14. The mininum number of pages of the report is three. The report should include a title, author names, an abstract, an introduction to the problem, methods, results, conclusion, and optionally references. Please submit your report to mudatamining@gmail.com.

Related Courses taught by Prof. Jianlin Cheng

  • Supervised Machine Learning
  • Computational Modeling of Molecular Structures
  • Data Mining and Knowledge Discovery
  • Machine Learning for Bioinformatics
  • Problem Solving in Bioinformatics
  • Computational Optimization Mehtods