Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 13, 2012

Common Crawl To Add New Data In Amazon Web Services Bucket

Filed under: Common Crawl,Dataset — Patrick Durusau @ 8:15 pm

Common Crawl To Add New Data In Amazon Web Services Bucket

From the post:

The Common Crawl Foundation is on the verge of adding to its Amazon Web Services (AWS) Public Data Set of openly and freely accessible web crawl data. It was back in January that Common Crawl announced the debut of its corpus on AWS (see our story here). Now, a billion new web sites are in the bucket, according to Common Crawl director Lisa Green, adding to the 5 billion web pages already there.

That’s good news!

At least I think so.

I am sure like everyone else, I will be trying to find the cycles (or at least thinking about it) to play (sorry, explore) the Common Crawl data set.

I hesitate to say without reservation this is a good thing because my data needs are more modest than searching the entire WWW.

That wasn’t so hard to say. Hurt a little but not that much. 😉

I am exploring how to get better focus on information resources of interest to me. I rather doubt that focus is going to start with the entire WWW as an information space. Will keep you posted.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress