Searching With Hierarchical Fields Using Solr by John Berryman.
From the post:
In our recent and continuing effort to make the world a better place, we have been working with the illustrious Waldo Jaquith on a project called StateDecoded. Basically, we’re making laws easily searchable and accessible by the layperson. Here, check out the predecessor project, Virginia Decoded. StateDecoded will be similar to the predecessor but with extended and matured search capabilities. And instead of just Virginia state law, with StateDecoded, any municipality will be able to download the open source project index their own laws and give their citizens better visibility in to the rules that govern them.
For this post, though, I want to focus upon one of the good Solr riddlers that we encountered related to the hierarchical nature of the documents being indexed. Laws are divided into sections, chapters, and paragraphs and we have documents at every level. In our Solr, this hierarchy is captured in a field labeled “section”. So for instance, here are 3 examples of this section field:
<field name="section">30</field>
– A document that contains information specific to section 30.<field name="section">30.4</field>
– A document that contains information specific to section 30 chapter 4.<field name="section">30.4.15</field>
– A document that contains information specific to section 30 chapter 4 paragraph 15.And our goal for this field is that if anyone searches for a particular section of law, that they will be given the law most specific to their request followed by the laws that are less specific. For instance, if a user searches for “30.4″, then the results should contain the documents for section 30, section 30.4, section 30.4.15, section 30.4.16, etc., and the first result should be for 30.4. Other documents such as 40.4 should not be returned.
(…)
Excellent riddler!
I suspect the same issue comes up in other contexts as well.