From the post:
This is a dataset of scans of 1000 public domain books that was released to the public at ICDAR 2007. At the time there was no public serving infrastructure, so few people actually got the 120GB dataset. It has since been hosted on Google Cloud Storage and made available for public download: (see the post for the links)
Intended for OCR and machine learning purposes. The results of which you may wish to unite in topic maps with other resources.