Data Mining: Spring 2013 (CMU) by Ryan Tibshirani.
Overview and Objectives [from syllabus]
Data mining is the science of discovering structure and making predictions in data sets (typically, large ones). Applications of data mining are happening all around you|and if they are done well, they may sometimes even go unnoticed. How does Google web search work? How does Shazam recognize a song playing in the background? How does Net Flix recommend movies to each of its users? How could we predict whether or not a person will develop breast cancer based on genetic information? How could we search for possible subgroups among breast cancer patients, suggesting different variants of the disease? An expert’s answer to any one of these questions may very well contain enough material to fill its own course, but basic answers stem from the principles of data mining.
Data mining spans the fields of statistics and computer science. Since this is a course in statistics, we will adopt a statistical perspective the majority of the course. Data mining also involves a good deal of both applied work (programming, problem solving, data analysis) and theoretical work (learning, understanding, and evaluating methodologies). We will try to maintain a balance between the two.
Upon completing this course, you should be able to tackle new data mining problems, by: (1) selecting the appropriate methods and justifying your choices; (2) implementing these methods programmatically (using, say, the R programming language) and evaluating your results; (3) explaining your results to a researcher outside of statistics or computer science.
Lecture notes, R files, what more could you want? 😉
Enjoy!