Non-Uniform Random Variate Generation by Luc Devroye.
From the introduction:
Random number generatlon has Intrigued sclentlsts for a few decades, and a lot of effort has been spent on the creatlon of randomness on a determlnlstlc (non-random) machlne, that Is, on the deslgn of computer algorlthms that are able to produce “random” sequences of lntegers. Thls Is a dlfflcult task. Such algorlthms are called generators, and all generators have flaws because all of them construct the n -th number In the sequence In functlon of the n -1 numbers precedlng It, lnltlallzed wlth a nonrandom seed. Numerous quantltles have been lnvented over the years that measure Just how “random” a sequence Is, and most well-known generators have been subJected to rlgorous statlstlcal testlng. How-ever, for every generator, It ls always posslble to And a statlstlcal test of a (possl- bly odd) property to make the generator flunk. The mathernatlcal tools that are needed to deslgn and analyze these generators are largely number theoretlc and comblnatorlal. These tools differ drastically from those needed when we want to generate sequences of lntegers wlth certain non-unlform dlstrlbutlons, glven that a perfect unlform random number generator 1s avallable. The reader should be aware that we provlde hlm wlth only half the story (the second half). The assGmptlon that a perfect unlform random number generator 1s avallable 1s now qulte unreallstlc, but, wlth tlme, It should become less so. Havlng made the assumptlon, we can bulld qulte a powerful theory of non-unlform random varlate generatlon.
You will need random numbers for some purposes in information retrieval but that isn’t why I mention this eight hundred (800) + page tome.
The author has been good enough to put the entire work up on the Internet and you are free to use it for any purpose, even reselling it.
I mention it because in a recent podcast about Solr 5, the greatest emphasis was on building and managing Solr clusters. Which is a very important use case if you are indexing and searching “big data.”
But in the rush to index and search “big data,” to what extent are we ignoring the need to index and search Small But Important Data (SBID)?
This book would qualify as SBID and even better, it already has an index against which to judge your Solr indexing.
And there are other smallish collections of texts. The Michael Brown grand jury transcripts, which are < 5,000 pages, the CIA Torture Report at 6,000 pages, and many others. Texts that don’t qualify as “big data” but still require highly robust indexing capabilities.
Take Non-Uniform Random Variate Generation as a SBID and practice target for Solr.
I first saw this in a tweet by Computer Science.