Prerequisite This course requires basic knowledge of statistics and artificial intelligence.

Course Description The course introduces students to fundamentals of data mining theory and algorithms. In addition to building a strong mathematical foundation, the course puts heavy emphasis on analysis and mining of actual data sets via popular data mining tools such as Weka, KNIME and R. The list of covered topics include classification (k-nearest neighborhood, classification tree, naïve Bayes, artificial neural networks), regression, clustering (k-means, fuzzy c-mean, hierarchical clustering), association rules and text mining. Feature selection, data cleaning, data transformation, model evaluation and data visualization are also covered in sufficient details. By the end of this course, students are expected to have learned the art of modeling and interpreting large complicated data sets via predictive and descriptive data mining methods.

Course Outline Overview of Data Mining: Definition, Original of Data Mining, Applications of Data Mining, Data Mining vs. OLAP and SQL Data Preparation/Wrangling: Feature Ranking, Feature Discretization, Normalization, Outlier Detection Techniques Classification/Supervised Learning: Classification Tree, Naïve Bayes, Neural Networks, k-NN Classifier, Logistic Regression Clustering/Unsupervised Learning: K-Means, Fuzzy c-Means, Self-Organizing Map Model Evaluation: Confusion Matrix, Recall and Precision, ROC Curve Patterns and Association Mining: A-Priori Algorithm

Reference Books Introduction to Data Mining by Tan, Steinbach and Kumar (2006) Data Mining Concepts and Techniques by Han and Kamber (2011) Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank (2011)

Marks Distribution Two Midterms - 40% Final - 40% Projects - 20%

PrerequisiteThis course requires basic knowledge of statistics and artificial intelligence.

Course DescriptionThe course introduces students to fundamentals of data mining theory and algorithms. In addition to building a strong mathematical foundation, the course puts heavy emphasis on analysis and mining of actual data sets via popular data mining tools such as Weka, KNIME and R. The list of covered topics include classification (k-nearest neighborhood, classification tree, naïve Bayes, artificial neural networks), regression, clustering (k-means, fuzzy c-mean, hierarchical clustering), association rules and text mining. Feature selection, data cleaning, data transformation, model evaluation and data visualization are also covered in sufficient details. By the end of this course, students are expected to have learned the art of modeling and interpreting large complicated data sets via predictive and descriptive data mining methods.

Course OutlineOverview of Data Mining: Definition, Original of Data Mining, Applications of Data Mining, Data Mining vs. OLAP and SQL

Data Preparation/Wrangling: Feature Ranking, Feature Discretization, Normalization, Outlier Detection Techniques

Classification/Supervised Learning: Classification Tree, Naïve Bayes, Neural Networks, k-NN Classifier, Logistic Regression

Clustering/Unsupervised Learning: K-Means, Fuzzy c-Means, Self-Organizing Map

Model Evaluation: Confusion Matrix, Recall and Precision, ROC Curve

Patterns and Association Mining: A-Priori Algorithm

Reference BooksIntroduction to Data Mining by Tan, Steinbach and Kumar (2006)

Data Mining Concepts and Techniques by Han and Kamber (2011)

Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank (2011)

Marks DistributionTwo Midterms - 40%

Final - 40%

Projects - 20%