Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 22, 2011

How to clone Wikipedia and index it with Solr

Filed under: Indexing,Solr — Patrick Durusau @ 3:17 pm

How to clone Wikipedia and index it with Solr

Looks like it is going to be a Wikipedia sorta day! 😉 Seriously, Wikipedia is increasing in importance with every new page or edit. Not to mention that the cited posts will give you experience with a variety of approaches and tools to dealing with Wikipedia as a data set.

From the post:

A major milestone for ZimZaz: I have (finally) successfully cloned Wikipedia and indexed it with Solr. It took about six weeks in calendar time and felt like a lot more. If I had made no mistakes and had to learn nothing, I could have done it in less than a business week. In the spirit of documenting my work and helping others, here are the key steps along the way.

Kudos to the author for documenting what went wrong and what went right!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress