From Algorithms to Z-Scores: Probabilistic and Statistical Modeling in Computer Science by Noram Matloff.
From the Overview:
The materials here form a textbook for a course in mathematical probability and statistics for computer science students. (It would work fine for general students too.)
“Why is this text different from all other texts?”
- Computer science examples are used throughout, in areas such as: computer networks; data and text mining; computer security; remote sensing; computer performance evaluation; software engineering; data management; etc.
- The R statistical/data manipulation language is used throughout. Since this is a computer science audience, a greater sophistication in programming can be assumed. It is recommended that my R tutorials be used as a supplement:
- Chapter 1 of my book on R software development, The Art of R Programming, NSP, 2011
- Part of a VERY rough and partial draft of that book. It is only about 50% complete, has various errors, and presents a number of topics differently from the final version, but should be useful in R work for this class.
- Throughout the units, mathematical theory and applications are interwoven, with a strong emphasis on modeling: What do probabilistic models really mean, in real-life terms? How does one choose a model? How do we assess the practical usefulness of models?
For instance, the chapter on continuous random variables begins by explaining that such distributions do not actually exist in the real world, due to the discreteness of our measuring instruments. The continuous model is therefore just that–a model, and indeed a very useful model.
There is actually an entire chapter on modeling, discussing the tradeoff between accuracy and simplicity of models.
- There is considerable discussion of the intuition involving probabilistic concepts, and the concepts themselves are defined through intuition. However, all models and so on are described precisely in terms of random variables and distributions.
Another open-source textbook from Norm Matloff!
Algorithms to Z-Scores (the book).
Source files for the book available at: http://heather.cs.ucdavis.edu/~matloff/132/PLN .
Norm suggests his R tutorial, R for Programmers http://heather.cs.ucdavis.edu/~matloff/R/RProg.pdf as supplemental reading material.
To illustrate the importance of statistics, Norm gives the following examples in chapter 1:
- The statistical models used on Wall Street made the quants” (quantitative analysts) rich— but also contributed to the worldwide financial crash of 2008.
- In a court trial, large sums of money or the freedom of an accused may hinge on whether the judge and jury understand some statistical evidence presented by one side or the other.
- Wittingly or unconsciously, you are using probability every time you gamble in a casino— and every time you buy insurance.
- Statistics is used to determine whether a new medical treatment is safe/effective for you.
- Statistics is used to flag possible terrorists —but sometimes unfairly singling out innocent people while other times missing ones who really are dangerous.
Mastering the material in this book will put you a long way to becoming a network “statistical skeptic.”
So you can debunk mis-leading or simply wrong claims by government, industry and special interest groups. Wait! Those are also known as advertisers. Never mind.