The Case for Curation: The Relevance of Digest and Citator Results in Westlaw and Lexis by Susan Nevelow Mart and Jeffrey Luftig.
Humans and machines are both involved in the creation of legal research resources. For legal information retrieval systems, the human-curated finding aid is being overtaken by the computer algorithm. But human-curated finding aids still exist. One of them is the West Key Number system. The Key Number system’s headnote classification of case law, started back in the nineteenth century, was and is the creation of humans. The retrospective headnote classification of the cases in Lexis’s case databases, started in 1999, was created primarily although not exclusively with computer algorithms. So how do these two very different systems deal with a similar headnote from the same case, when they link the headnote to the digesting and citator functions in their respective databases? This paper continues an investigation into this question, looking at the relevance of results from digest and citator search run on matching headnotes in ninety important federal and state cases, to see how each performs. For digests, where the results are curated – where a human has made a judgment about the meaning of a case and placed it in a classification system – humans still have an advantage. For citators, where algorithm is battling algorithm to find relevant results, it is a matter of the better algorithm winning. But no one algorithm is doing a very good job of finding all the relevant results; the overlap between the two citator systems is not that large. The lesson for researchers: know how your legal research system was created, what involvement, if any, humans had in the curation of the system, and what a researcher can and cannot expect from the system you are using.
A must read for library students and legal researchers.
For legal research, the authors conclude:
The intervention of humans as curators in online environments is being recognized as a way to add value to an algorithm’s results, in legal research tools as well as web-based applications in other areas. Humans still have an edge in predicting which cases are relevant. And the intersection of human curation and algorithmically-generated data sets is already well underway. More curation will improve the quality of results in legal research tools, and most particularly can be used to address the algorithmic deficit that still seems to exist where analogic reasoning is needed. So for legal research, there is a case for curation. [footnotes omitted]
The distinction between curation, human gathering of relevant material and aggregation, machine gathering of potentially relevant material looks quite useful.
I first saw this at Legal Informatics.