Archive for the ‘SolrMarc’ Category

MaRC and SolrMaRC

Sunday, July 8th, 2012

MaRC and SolrMaRC by Owen Stephens.

From the post:

At the recent Mashcat event I volunteered to do a session called ‘making the most of MARC’. What I wanted to do was demonstrate how some of the current ‘resource discovery’ software are based on technology that can really extract value from bibliographic data held in MARC format, and how this creates opportunities for in both creating tools for users, and also library staff.

One of the triggers for the session was seeing, over a period of time, a number of complaints about the limitations of ‘resource discovery’ solutions – I wanted to show that many of the perceived limitations were not about the software, but about the implementation. I also wanted to show that while some technical knowledge is needed, some of these solutions can be run on standard PCs and this puts the tools, and the ability to experiment and play with MARC records, in the grasp of any tech-savvy librarian or user.

Many of the current ‘resource discovery’ solutions available are based on a search technology called Solr – part of a project at the Apache software foundation. Solr provides a powerful set of indexing and search facilities, but what makes it especially interesting for libraries is that there has been some significant work already carried out to use Solr to index MARC data – by the SolrMARC project. SolrMARC delivers a set of pre-configured indexes, and the ability to extract data from MARC records (gracefully handling ‘bad’ MARC data – such as badly encoded characters etc. – as well). While Solr is powerful, it is SolrMARC that makes it easy to implement and exploit in a library context.

SolrMARC is used by two open source resource discovery products – VuFind and Blacklight. Although VuFind and Blacklight have differences, and are written in different languages (VuFind is PHP while Blacklight is Ruby), since they both use Solr and specifically SolrMARC to index MARC records the indexing and search capabilities underneath are essentially the same. What makes the difference between implementations is not the underlying technology but the configuration. The configuration allows you to define what data, from which part of the MARC records, goes into which index in Solr.

Owen explains his excitement over these tools as:

These tools excite me for a couple of reasons:

  1. A shared platform for MARC indexing, with a standard way of programming extensions gives the opportunty to share techniques and scripts across platforms – if I write a clever set of bean shell scripts to calculate page counts from the 300 field (along the lines demonstrated by Tom Meehan in another Mashcat session), you can use the same scripts with no effort in your SolrMARC installation
  2. The ability to run powerful, but easy to configure, search tools on standard computers. I can get Blacklight or VuFind running on a laptop (Windows, Mac or Linux) with very little effort, and I can have a few hundred thousand MARC records indexed using my own custom routines and searchable via an interface I have complete control over

I like the “geek” appeal of #2, but creating value-add interfaces for the casual user is more likely to attract positive PR for a library.

As far as #1, how uniform are the semantics of MARC fields?

I suspect physical data, page count, etc., are fairly stable/common, what about more subjective fields? How would you test that proposition?

SolrMarc 2.3.1 – Critical Bug Fix

Saturday, October 15th, 2011

SolrMarc 2.3.1 – Critical Bug Fix

Robert Haschart writes:

The recently released SolrMarc 2.3 has a serious problem where commits to a local solr index sets the expungeDeletes flag which causes a segment merge which can be nearly as expensive as a index optimize. Furthermore changes in the defaults for certain configuration properties cause the above behavior to be chosen by default. At UVA the processing time for our nightly updates jumped from about 30 minutes (of which about 20 minutes is the index optimize) to about 2hr 30 minutes.

So the error has been fixed and an updated version has been released. If you have recently downloaded a copy SolrMarc version 2.3, discard it, and download a copy of the updated release SolrMarc version 2.3.1

SolrMarc

Tuesday, October 4th, 2011

SolrMarc

From the webpage:

Solrmarc can index your marc records into apache solr. It also comes with an improved version of marc4j that improves handling of UTF-8 characters, is more forgiving of malformed marc data, and can recover from data errors gracefully. This indexer is used by blacklight (http://blacklight.rubyforge.org) and vufind (http://www.vufind.org/) but it can also be used as a standalone project.

Nice if short discussion of custom indexing with SolrMarc.