Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 13, 2013

Implementing a Custom Search Syntax…

Filed under: Lucene,Patents,Solr — Patrick Durusau @ 8:33 pm

Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled by John Berryman.

Description:

In a recent project with the United States Patent and Trademark Office, Opensource Connections was asked to prototype the next generation of patent search – using Solr and Lucene. An important aspect of this project was the implementation of BRS, a specialized search syntax used by patent examiners during the examination process. In this fast paced session we will relate our experiences and describe how we used a combination of Parboiled (a Parser Expression Grammar [PEG] parser), Lucene Queries and SpanQueries, and an extension of Solr’s QParserPlugin to build BRS search functionality in Solr. First we will characterize the patent search problem and then define the BRS syntax itself. We will then introduce the Parboiled parser and discuss various considerations that one must make when designing a syntax parser. Following this we will describe the methodology used to implement the search functionality in Lucene/Solr. Finally, we will include an overview our syntactic and semantic testing strategies. The audience will leave this session with an understanding of how Solr, Lucene, and Parboiled may be used to implement their own custom search parser.

One part of the task was to re-implement a thirty (30) year old query language on modern software. (Ouch!)

Uses parboiled to parse the query syntax.

On parboiled:

parboiled is a mixed Java/Scala library providing for lightweight and easy-to-use, yet powerful and elegant parsing of arbitrary input text based on Parsing expression grammars (PEGs). PEGs are an alternative to context free grammars (CFGs) for formally specifying syntax, they make a good replacement for regular expressions and generally have quite a few advantages over the “traditional” way of building parsers via CFGs. parboiled is released under the Apache License 2.0.

Covers a plugin for the custom query language.

Great presentation, although one where you will want to be following the slides (below the video).

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress