How to Query the StackExchange Databases by Brent Ozar.
From the post:
During next week’s Watch Brent Tune Queries webcast, I’m using my favorite demo database: Stack Overflow. The Stack Exchange folks are kind enough to make all of their data available via BitTorrent for Creative Commons usage as long as you properly attribute the source.
There’s two ways you can get started writing queries against Stack’s databases – the easy way and the hard way.
….
I’m sure you have never found duplicate questions or answers on StackExchange.
But just in case such a thing existed, detecting and merging the duplicates from StackExchange would be a good exercise at data analysis, subject identification, etc.
😉
BTW, Brent’s webinar is 21 January 2014, or next Tuesday (as of this post).
Enjoy!