Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 27, 2012

ArcSpread for analyzing web archives

Filed under: Archives — Patrick Durusau @ 6:10 pm

ArcSpread for analyzing web archives

Pete Warden writes:

Stanford runs a fantastic project for capturing important web pages as they change over time, and then presenting the results in a form that future historians will be able to use. This paper talks about some of the techniques they use for removing boilerplate navigation and ad content, so that researchers can work with the meat of the page.

I was relieved to read:

We did not excise any advertising images from the presented pages, but asked participants to disregard advertising related images.

Poorly done digital newspaper archives remove advertising content on a “meat of the page” theory.

Researchers cannot notice what was advertised, how and at what prices. Ads may not interest us, but may interest others.

At one time thousands if not hundreds of thousands of people knew how Egyptian pyramids were build.

So commonly known it was not written down.

Perhaps there is a lesson there for us.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress