Skip to content

mhahsler/Introduction_to_Data_Mining_R_Examples

Repository files navigation

R Companion for Introduction to Data Mining

R and tidyverse are very popular for data mining. This repository contains slides and documented R examples to accompany several chapters of the popular data mining text book:

Pang-Ning Tan, Michael Steinbach, Anuj Karpatne and Vipin Kumar, Introduction to Data Mining, Addison Wesley, 1st or 2nd edition.

The slides and examples are used in my course CS 5/7331 Data Mining taught at SMU and will be regularly updated and improved. The code examples are now compiled into the free online book An R Companion for Introduction to Data Mining which is published under the creative commons attribution license and you can share and adapt them freely. Please open an issue for corrections or to suggest improvements.

Covered Chapters

Textbook Chapter* Slides Companion R Code
1. Introduction Slides R Code
2. Data Slides R Code
-. Web Chapter: Exploring Data Slides R Code
3. Classification: Basic Concepts and Techniques Slides R Code
4. Classification: Alternative Techniques Slides R Code
5. Association Analysis: Basic Concepts and Algorithms Slides R Code
6. Association Analysis: Advanced Concepts - R Code
7. Cluster Analysis: Basic Concepts and Algorithms Slides R Code

* Textbook chapters are from Introduction to Data Mining, 2nd edition. Most used chapters are available as the linked free sample chapters.

Software Requirements

You need to install:

Each book chapter will use a set of packages that must be installed. The installation is done directly in R and the installation code can be found at the beginning of each chapter.

Statement of Need

The textbook Introduction to Data Mining has been one of the most popular choices to learn and teach data mining concepts. Some of the most important chapters have been made available for free by the authors on the books's website. One of the authors also provides Python Jupyter notebooks with examples, but complete R code examples were still needed. Given the R community's interest in data analysis, data science, and machine learning, and the broad support of R packages for data mining, there was a noticeable gap that was filled by this learning resource. This resource targets advanced undergraduate and graduate students and can be used as a component for a first introduction to data mining.

Instructor Resources

  • PowerPoint presentation files for a data mining course can be found in the repository directory slides. The slides have an R symbol at the bottom whenever there are R code examples available.
  • Datasets for projects can be found at https://www.kaggle.com/datasets

License

Creative Commons License All code and documents in this repository are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

For questions please contact Michael Hahsler.