This course requires basic knowledge of statistics and artificial intelligence.

Course Description
The course introduces students to fundamentals of data mining theory and algorithms. In addition to building a strong mathematical foundation, the course puts heavy emphasis on analysis and mining of actual data sets via popular data mining tools such as Weka, KNIME and R. The list of covered topics include classification (k-nearest neighborhood, classification tree, naïve Bayes, artificial neural networks), regression, clustering (k-means, fuzzy c-mean, hierarchical clustering), association rules and text mining. Feature selection, data cleaning, data transformation, model evaluation and data visualization are also covered in sufficient details. By the end of this course, students are expected to have learned the art of modeling and interpreting large complicated data sets via predictive and descriptive data mining methods.

Course Outline
Overview of Data Mining: Definition, Original of Data Mining, Applications of Data Mining, Data Mining vs. OLAP and SQL
Data Preparation/Wrangling: Feature Ranking, Feature Discretization, Normalization, Outlier Detection Techniques
Classification/Supervised Learning: Classification Tree, Naïve Bayes, Neural Networks, k-NN Classifier, Logistic Regression
Clustering/Unsupervised Learning: K-Means, Fuzzy c-Means, Self-Organizing Map
Model Evaluation: Confusion Matrix, Recall and Precision, ROC Curve
Patterns and Association Mining: A-Priori Algorithm

Reference Books
Introduction to Data Mining by Tan, Steinbach and Kumar (2006)
Data Mining Concepts and Techniques by Han and Kamber (2011)
Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank (2011)

Marks Distribution
Two Midterms - 40%
Final - 40%
Projects - 20%