Advanced Data Analysis from an Elementary Point of View by Cosma Rohilla Shalizi. (UPDATE: 2014 draft
From the Introduction:
These are the notes for 36-402, Advanced Data Analysis, at Carnegie Mellon. If you are not enrolled in the class, you should know that it’s the methodological capstone of the core statistics sequence taken by our undergraduate majors (usually in their third year), and by students from a range of other departments. By this point, they have taken classes in introductory statistics and data analysis, probability theory, mathematical statistics, and modern linear regression (“401”). This class does not presume that you have learned but forgotten the material from the pre-requisites; it presumes that you know that material and can go beyond it. The class also presumes a firm grasp on linear algebra and multivariable calculus, and that you can read and write simple functions in R. If you are lacking in any of these areas, now would be an excellent time to leave.
36-402 is a class in statistical methodology: its aim is to get students to understand something of the range of modern1 methods of data analysis, and of the considerations which go into choosing the right method for the job at hand (rather than distorting the problem to fit the methods the student happens to know). Statistical theory is kept to a minimum, and largely introduced as needed.
[Footnote 1] Just as an undergraduate “modern physics” course aims to bring the student up to about 1930 (more specifically, to 1926), this class aims to bring the student up to about 1990.
Very recent introduction to data analysis. Shalizi includes a list of concepts in the introduction that best be mastered before tackling this material.
According to footnote 1, when you have mastered this material, you have another twenty-two years to make up in general and on your problem in particular.
Still, knowing it cold will put you ahead of a lot of data analysis you are going to encounter.
I first saw this in a tweet by Gene Golovchinsky.