Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 29, 2016

Serious Non-Transparency (+ work around)

Filed under: Books,Data Mining,Law,Searching — Patrick Durusau @ 4:00 pm

I mentioned http://www.bkstr.com/ yesterday in my post: Courses -> Texts: A Hidden Relationship, where I lamented the inability to find courses by their titles.

So you could easily discover the required/suggested texts for any given course. Like browsing a physical campus bookstore.

Obscurity is an “information smell” (to build upon Felienne‘s expansion of code smell to spreadsheets).

In this particular case, the “information smell” is skunk class.

I revisited http://www.bkstr.com/ today to extract its > 1200 bookstores for use in crawling a sample of those sites.

For ugly HTML, view the source of: http://www.bkstr.com/.

Parsing that is going to take time and surely there is an easy way to get a sample of the sites for mining.

The idea didn’t occur to me immediately but I noticed yesterday that the general form of web addresses was:

bookstore-prefix.bkstr.com

So, after some flailing about with the HTML from bkstr.com, I searched for “bkstr.com” and requested all the results.

I’m picking a random ten bookstores with law books for further searching.

Not a high priority but I am curious what lies behind the smoke, mirrors, complex HTML and poor interfaces.

Maybe something, maybe nothing. Won’t know unless we look.

PS: Perhaps a better query string:

www.bkstr.com textbooks-and-course-materials

Suggested refinements?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress