Archive for the ‘CJK’ Category

Searching in Solr, Analyzing Results and CJK

Saturday, January 25th, 2014

Searching in Solr, Analyzing Results and CJK

From the post:

In my recently completed twelve post series on Chinese, Japanese and Korean (CJK) with Solr for Libraries, my primary objective was to make information available to others in an expeditious manner. However, the organization of the topics is far from optimal for readers, and the series is too long for easy skimming for topics of interest. Therefore, I am providing this post as a sort of table of contents into the previous series.

In Fall 2013, we rolled out some significant improvements for Chinese, Japanese and Korean (CJK) resource discovery in SearchWorks, the Stanford library “catalog” built with Blacklight on top of our Solr index. If your collection has a significant number of CJK resources and they are in multiple languages, you might be interested in our recipes. You might also be interested if you have a significant number of resources in multiple languages, period.

If you are interested in improving searching, or in improving your methodology when working on searching, these posts provide a great deal of information. Analysis of Solr result relevancy figured heavily in this work, as did testing: relevancy/acceptance/regression testing against a live Solr index, unit testing, and integration testing. In addition, there was testing by humans, which was well managed and produced searches that were turned into automated tests. Many of the blog entries contain useful approaches for debugging Solr relevancy and for test driven development (TDD) of new search behavior.


I am sure many of the issues addressed here will be relevant should anyone decide to create a Solr index to the Assyrian Dictionary of the Oriental Institute of the University of Chicago (CAD).

Quite serious. At least I would be interested at any rate.