Introducing Luwak, a library for high-performance stored queries by Charlie Hull.
From the post:
A few weeks ago we spoke in Dublin at Lucene Revolution 2013 on our work in the media monitoring sector for various clients including Gorkana and Australian Associated Press. These organisations handle a huge number (sometimes hundreds of thousands) of news articles every day and need to apply tens of thousands of stored expressions to each one, which would be extremely inefficient if done with standard search engine libraries. We’ve developed a much more efficient way to achieve the same result, by pre-filtering the expressions before they’re even applied: effectively we index the expressions and use the news article itself as a query, which led to the presentation title ‘Turning Search Upside Down’.
We’re pleased to announce the core of this process, a Java library we’ve called Luwak, is now available as open source software for your own projects. Here’s how you might use it:
That may sound odd, using the article as the query but be aware that Charlie reports “speeds of up to 70,000 stored queries applied to an article in around a second on modest hardware.”
Perhaps not “big data speed” but certainly enough speed to get your attention.
Charlie mentions in his Dublin slides that Luwak could be used to “Add metadata to items based on their content.”
That one use case but creating topic/associations out of content would be another.