Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 26, 2011

MontySolr: A Search Solution for Python Lovers With the Speed of Native Java

Filed under: INSPIRE,Solr — Patrick Durusau @ 8:02 pm

MontySolr: A Search Solution for Python Lovers With the Speed of Native Java

From the post:

The folks at CERN wanted a better way to search High Energy Physics fulltext paper repositories and bibliographical databases that produce result set numbers in the multi-millions. INSPIRE, the that merges the sources’ query results, though, is written in Python. In order to move back and forth as quickly as possible between the two systems, CERN decided among a number of options to embed INSPIRE in Solr.

The result, MontySolr, utilizes the power of Java and works with any Python applicatio, as well as any C/C++ app that Python understands. For more information on MontySolr, check this video of Roman Chyla (CERN).

Let’s run all counters back to zero and start again. This time with the abstract from the original presentation in San Francisco, May, 2011:

SPIRES is the biggest bibliographic database for High Energy Physics, ArXiv is the biggest fulltext repository for the fulltext papers in High Energy Physics, and INSPIRE is the biggest digital library that merges the two. We must work with result sets bigger than 1 million for citation related queries and our partners from Astrophysics with 6 million sets, however INSPIRE is written in Python. So how do we move several million result sets between the two systems fast? How do we take advantage of our special NLP processing pipeline written in Python? How do we join them? We do not use Jython. We do not use pipes. We do not embed Solr inside INSPIRE. We embed INSPIRE into Solr! The talk shows benefits and challenges of this surprisingly elegant solution.

With the original title:

CPython Embedded in Solr – Search Solution for Python Lovers With the Speed of Native Java

You will need to slides to really appreciate the video.

And MontySolr on Github.

Impressive results!

But, the real kicker is that C and C++ apps are made available insider Solr. Such as for NLP!

INSPIRE

Filed under: CERN,INSPIRE — Patrick Durusau @ 8:01 pm

INSPIRE

From the webpage:

CERN, DESY, Fermilab and SLAC have built the next-generation High Energy Physics (HEP) information system, INSPIRE, which empowers scientists with innovative tools for successful research at the dawn of an era of new discoveries.

INSPIRE combines the successful SPIRES database content, curated at DESY, Fermilab and SLAC, with the Invenio digital library technology developed at CERN. INSPIRE is run by a collaboration of the four labs, and interacts closely with HEP publishers, arXiv.org, NASA-ADS, PDG, and other information resources.

INSPIRE represents a natural evolution of scholarly communication, built on successful community-based information systems, and provides a vision for information management in other fields of science.

INSPIRE builds on SPIRES’ expertise

  • Decades of trusted, curated content
  • Experience in managing a discipline’s wide information resources
  • Close relationship with the worldwide user community

What are the major innovations of INSPIRE?

  • Author disambiguation for high-quality profiles and improved search capabilities
  • Fulltext search and snippet display for access restricted content
  • Faster results
  • Variety of search and display options
  • Detailed record pages
  • Searchable fulltext for 5 years of arXiv content
  • Figures and searchable figure captions extracted from 5 years of arXiv articles
  • LHC experimental notes

What will be available soon?

  • Personalized features (bookshelves, author pages, paper claiming)
  • More APIs for third parties to build new tools
  • More historical content
  • Conference slides

Deeply cool digital library system from CERN.

Powered by WordPress