I mentioned http://www.bkstr.com/ yesterday in my post: Courses -> Texts: A Hidden Relationship, where I lamented the inability to find courses by their titles.
So you could easily discover the required/suggested texts for any given course. Like browsing a physical campus bookstore.
Obscurity is an “information smell” (to build upon Felienne‘s expansion of code smell to spreadsheets).
In this particular case, the “information smell” is skunk class.
I revisited http://www.bkstr.com/ today to extract its > 1200 bookstores for use in crawling a sample of those sites.
For ugly HTML, view the source of: http://www.bkstr.com/.
Parsing that is going to take time and surely there is an easy way to get a sample of the sites for mining.
The idea didn’t occur to me immediately but I noticed yesterday that the general form of web addresses was:
bookstore-prefix.bkstr.com
So, after some flailing about with the HTML from bkstr.com, I searched for “bkstr.com” and requested all the results.
I’m picking a random ten bookstores with law books for further searching.
Not a high priority but I am curious what lies behind the smoke, mirrors, complex HTML and poor interfaces.
Maybe something, maybe nothing. Won’t know unless we look.
PS: Perhaps a better query string:
www.bkstr.com textbooks-and-course-materials
Suggested refinements?