Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 11, 2011

A Data Parallel toolkit for Information Retrieval

Filed under: Data Mining,Information Retrieval,Search Algorithms,Searching — Patrick Durusau @ 5:53 am

A Data Parallel toolkit for Information Retrieval

From the website:

Many modern information retrieval data analyses need to operate on web-scale data collections. These collections are sufficiently large as to make single-computer implementations impractical, apparently necessitating custom distributed implementations.

Instead, we have implemented a collection of Information Retrieval analyses atop DryadLINQ, a research LINQ provider layer over Dryad, a reliable and scalable computational middleware. Our implementations are relatively simple data parallel adaptations of traditional algorithms, and, due entirely to the scalability of Dryad and DryadLINQ, scale up to very large data sets. The current version of the toolkit, available for download below, has been successfully tested against the ClueWeb corpus.

Are you using large data sets in the construction of your topic maps?

Where large is taken to mean data sets in the range of one billion documents. (http://boston.lti.cs.cmu.edu/Data/clueweb09/)

The authors of this work are attempting to extend access to large data sets to a larger audience.

Did they succeed?

Is their work useful for smaller data sets?

What tools would you add to assist more specifically with topic map construction?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress