How to clone Wikipedia and index it with Solr
Looks like it is going to be a Wikipedia sorta day! 😉 Seriously, Wikipedia is increasing in importance with every new page or edit. Not to mention that the cited posts will give you experience with a variety of approaches and tools to dealing with Wikipedia as a data set.
From the post:
A major milestone for ZimZaz: I have (finally) successfully cloned Wikipedia and indexed it with Solr. It took about six weeks in calendar time and felt like a lot more. If I had made no mistakes and had to learn nothing, I could have done it in less than a business week. In the spirit of documenting my work and helping others, here are the key steps along the way.
Kudos to the author for documenting what went wrong and what went right!