ArcSpread for analyzing web archives
Pete Warden writes:
Stanford runs a fantastic project for capturing important web pages as they change over time, and then presenting the results in a form that future historians will be able to use. This paper talks about some of the techniques they use for removing boilerplate navigation and ad content, so that researchers can work with the meat of the page.
I was relieved to read:
We did not excise any advertising images from the presented pages, but asked participants to disregard advertising related images.
Poorly done digital newspaper archives remove advertising content on a “meat of the page” theory.
Researchers cannot notice what was advertised, how and at what prices. Ads may not interest us, but may interest others.
At one time thousands if not hundreds of thousands of people knew how Egyptian pyramids were build.
So commonly known it was not written down.
Perhaps there is a lesson there for us.