Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 13, 2015

An R Client for the Internet Archive API

Filed under: Data Mining,R — Patrick Durusau @ 8:19 pm

An R Client for the Internet Archive API by Lincoln Mullen.

From the webpage:

In support of some of my research projects, I created a simple R package to access the Internet Archive’s API. The package is intended to search for items, to retrieve their metadata in a usable form, and to download the files associated with the items. The package, called internetarchive, is available on GitHub. The README and the vignette have a full explanation, but here is a brief overview.

This is cool!

And a great way to contrast NSA data collection with useful data collection.

If you were the NSA, you would suck down all the new Internet Archive content everyday. Then you would “explore” that plus lots of other content for “relationships.” Which abound in any data set that large.

If you are Lincoln Mullen or someone empowered by his work, you search for items and incrementally build a set of items with context and additional information you add to that set.

If you were paying the bill, which of those approaches seems the most likely to produce useful results?

Information/data/text mining doesn’t change in nature due to size or content or the purpose of the searching or whose paying the bill. The goal is useful (or should be) useful results for some purpose X.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress