Pete Warden does the data community a solid and wraps up a collection of open-source tools in the Data Science Toolkit to parse, geocode, and process data.
Mostly geographic material but some other interesting tools, such as extracting the “main” story from a document. (It has never encountered one of my longer email exchanges with Newcomb. )
It is interesting to me that so many tools and data sets related to geography appear so regularly.
GIS (geographic information systems) can be very hard but perhaps they are easier than the semantic challenges of say medical or legal literature.
That is it is easier to say here you are with regard to a geographic system than to locate a subject in a conceptual space which has been partially captured by a document.
Suspect the difference in hardness could only be illustrated by example and not by some test. Will have to give that some thought.