Algorithms for Massive Data Sets by Inge Li Gørtz and Philip Bille.
From the course description:
A student who has met the objectives of the course will be able to:
- Describe an algorithm in a comprehensible manner, i.e., accurately, concise, and unambiguous.
- Prove correctness of algorithms.
- Analyze, evaluate, and compare the performance of algorithms in models of computation relevant to massive data sets.
- Analyze, evaluate, and compare the quality and reliability of solutions.
- Apply and extend relevant algorithmic techniques for massive data sets.
- Design algorithms for problems related to massive data sets.
- Lookup and apply relevant research literature for problems related to massive data sets.
- Systematically identify and analyze problems and make informed choices for solving the problems based on the analysis.
- Argue clearly for the choices made when solving a problem.
Papers, slides and exercises provided for these topics:
Week 1: Introduction and Hashing: Chained, Universal, and Perfect.
Week 2: Predecessor Data Structures: x-fast tries and y-fast tries.
Week 3: Decremental Connectivity in Trees: Cluster decomposition, Word-Level Parallelism.
Week 4: Nearest Common Ancestors: Distributed data structures, Heavy-path decomposition, alphabetic codes.
Week 5: Amortized analysis and Union-Find.
Week 6: Range Reporting: Range Trees, Fractional Cascading, and kD Trees.
Week 7: Persistent data structures.
Week 8: String matching.
Week 9: String Indexing: Dictionaries, Tries, Suffix trees, and Suffix Sorting.
Week 10: Introduction to approximation algorithms. TSP, k-center, vertex cover.
Week 11: Approximation algorithms:
Set Cover, stable matching.Week 12: External Memory: I/O Algorithms, Cache-Oblivious Algorithms, and Dynamic Programming
Just reading the papers will improve your big data skills.