CPSC445/CPSC545/MBB334/MBB545/CBB545

Introduction to Data Mining

Course description

The quantity and variety of online data is increasing very rapidly. The data mining process includes data selection and cleaning, machine learning techniques to ``learn" knowledge that is ``hidden" in data, and the reporting and visualization of the resulting knowledge. This course will cover these issues and will illustrate the whole process by examples of practical applications from the life sciences, computer science, and commerce. Several machine learning topics including classification, prediction, and clustering will be covered. Students will learn and use the open source R statistical software, see http://www.r-project.org, and machine learning packages.

Course Objectives include:

  • To introduce students to basic applications, concepts, and techniques of data mining.
  • To develop skills for using recent data mining software (eg. R) to solve practical problems in a variety of disciplines.
  • To gain experience doing independent study and research.

Online materials

See URLS in the ``Materials" folder of the course page in http://classes.yale.edu.

Grading elements

Homework 70%, Term project 30%.

Course wiki

http://wiki.gersteinlab.org/pubinfo/index.php/Cs545-07