Foundations of Data Science by John Hopcroft and Ravindran Kannan.

From the introduction:

Computer science as an academic discipline began in the 60’s. Emphasis was on programming languages, compilers, operating systems, and the mathematical theory that supported these areas. Courses in theoretical computer science covered nite automata, regular expressions, context free languages, and computability. In the 70’s, algorithms was added as an important component of theory. The emphasis was on making computers useful. Today, a fundamental change is taking place and the focus is more on applications. There are many reasons for this change. The merging of computing and communications has played an important role. The enhanced ability to observe, collect and store data in the natural sciences, in commerce, and in other elds calls for a change in our understanding of data and how to handle it in the modern setting. The emergence of the web and social networks, which are by far the largest such structures, presents both opportunities and challenges for theory.

While traditional areas of computer science are still important and highly skilled individuals are needed in these areas, the majority of researchers will be involved with using computers to understand and make usable massive data arising in applications, not just

how to make computers useful on specific well-defined problems. With this in mind we have written this book to cover the theory likely to be useful in the next 40 years, just as automata theory, algorithms and related topics gave students an advantage in the last 40 years. One of the major changes is the switch from discrete mathematics to more of an emphasis on probability, statistics, and numerical methods.

In draft form but impressive!

Current chapters:

- Introduction
- High-Dimensional Space
- Random Graphs
- Singular Value Decomposition (SVD)
- Random Walks and Markov Chains
- Learning and the VC-dimension
- Algorithms for Massive Data Problems
- Clustering
- Topic Models, Hidden Markov Process, Graphical Models, and Belief Propagation
- Other Topics [Rankings, Hare System for Voting, Compressed Sensing and Sparse Vectors]
- Appendix

I am certain the authors would appreciate comments and suggestions concerning the text.

I first saw this in a tweet by CompSciFact.