MontySolr: A Search Solution for Python Lovers With the Speed of Native Java
From the post:
The folks at CERN wanted a better way to search High Energy Physics fulltext paper repositories and bibliographical databases that produce result set numbers in the multi-millions. INSPIRE, the that merges the sources’ query results, though, is written in Python. In order to move back and forth as quickly as possible between the two systems, CERN decided among a number of options to embed INSPIRE in Solr.
The result, MontySolr, utilizes the power of Java and works with any Python applicatio, as well as any C/C++ app that Python understands. For more information on MontySolr, check this video of Roman Chyla (CERN).
Let’s run all counters back to zero and start again. This time with the abstract from the original presentation in San Francisco, May, 2011:
SPIRES is the biggest bibliographic database for High Energy Physics, ArXiv is the biggest fulltext repository for the fulltext papers in High Energy Physics, and INSPIRE is the biggest digital library that merges the two. We must work with result sets bigger than 1 million for citation related queries and our partners from Astrophysics with 6 million sets, however INSPIRE is written in Python. So how do we move several million result sets between the two systems fast? How do we take advantage of our special NLP processing pipeline written in Python? How do we join them? We do not use Jython. We do not use pipes. We do not embed Solr inside INSPIRE. We embed INSPIRE into Solr! The talk shows benefits and challenges of this surprisingly elegant solution.
With the original title:
CPython Embedded in Solr – Search Solution for Python Lovers With the Speed of Native Java
You will need to slides to really appreciate the video.
And MontySolr on Github.
Impressive results!
But, the real kicker is that C and C++ apps are made available insider Solr. Such as for NLP!