CPSC445/CPSC545/MBB334/MBB545/CBB545
Introduction to Data MiningThe quantity and variety of online data is increasing very rapidly. The data mining process includes data selection and cleaning, machine learning techniques to ``learn" knowledge that is ``hidden" in data, and the reporting and visualization of the resulting knowledge. This course will cover these issues and will illustrate the whole process by examples of practical applications from the life sciences, computer science, and commerce. Several machine learning topics including classification, prediction, and clustering will be covered. Students will learn and use the open source R statistical software, see http://www.r-project.org, and machine learning packages.
Course Objectives include:
See URLS in the ``Materials" folder of the course page in http://classes.yale.edu.
Homework 70%, Term project 30%.