Detecting Text Reuse in Nineteenth-Century Legal Documents:…

Thursday, March 12th, 2015

Detecting Text Reuse in Nineteenth-Century Legal Documents: Methods and Preliminary Results by Lincoln Mullen.

From the post:

How can you track changes in the law of nearly every state in the United States over the course of half a century? How can you figure out which states borrowed laws from one another, and how can you visualize the connections among the legal system as a whole?

Kellen Funk, a historian of American law, is writing a dissertation on how codes of civil procedure spread across the United States in the second half of the nineteenth century. He and I have been collaborating on the digital part of this project, which involves identifying and visualizing the borrowings between these codes. The problem of text reuse is a common one in digital history/humanities projects. In this post I want to describe our methods and lay out some of our preliminary results. To get a fuller picture of this project, you should read the four posts that Kellen has written about his project:

Quite a remarkable project with many aspects that will be relevant to other projects.

Lincoln doesn’t use the term but this would be called textual criticism, if it were being applied to the New Testament. Of course here, Lincoln and Kellen have the original source document and the date of its adoption. New Testament scholars have copies of copies in no particular order and no undisputed evidence of the original text.

Did I mention that all the source code for this project is on Github?

Big data: too much information

Tuesday, February 24th, 2015

Big data: too much information by Joanna Goodman.

Joanna's post was the source I used for part of the post Enhanced Access to UK Legislation. I wanted to call attention to her post because it covered more than just the site and offered several insights into the role of big data in law.

Consider Joanna’s list of ways big data can help with litigation:

Big data analysis – nine ways it can help

1 Big data analytics use algorithms to interrogate large volumes of unstructured, anonymised data to identify correlations, patterns and trends.

2 Has the potential to uncover patterns – and opportunities – that are not immediately obvious.

3 Graphics are key – visual representation is the only clear and comprehensive way to present the outcomes of big data analysis.

4 E-discovery is an obvious practical application of big data to legal practice, reducing the time and cost of trawling through massive volumes of structured and unstructured data held in different places.

5 Can identify patterns and trends, using client and case data, in dispute resolution to predict the probability of case outcomes. This facilitates decision-making – for example whether a claimant should pursue a case or to settle.

6 In the UK, the Big Data for Law project is digitising the entire statute book so that all UK legislation can be analysed, together with publicly available data from legal publishers. This will create the most comprehensive record of all UK legislation ever created together with analytical tools.

7 A law firm can use big data analytics to offer its insurance clients a service that identifies potentially fraudulent claims.

8 Big data will be usable as a design tool, to identify design patterns within statutes – combinations of rules that are used repeatedly to meet policy goals.

9 Can include transactional data and data from external sources, which can be cut in different ways.

Just as a teaser because the rest of her post is as interesting as what I quoted above, how would you use big data to shape debt collection practices?

See Joanna’s post to find out!

Enhanced Access to UK Legislation

Tuesday, February 24th, 2015

The site appears to be a standard legal access site, albeit from 1267 CE to present.

Its Changes made by legislation enacted from 2002 – present is useful but suffers from presenting changes to texts with tables.

Next year will be the 30th anniversary of the publication of ISO 8879, the SGML standard and texts are still aligned with tables to show changes. There have been better ways to present changes, HyTime, XML with XLink/Xpointer, and more recent additions to the XML family such as XPath and XQuery.

Inline presentation of changes with a navigation aid to choose the source of changes would be far more intuitive. Perhaps in a future version of this resource.

I say in a future version of this site because of the following description of the hopes for this site:

In 2014 big data moved centre stage, with the Arts and Humanities Research Council-funded Data for Law project, which was launched to facilitate socio-economic research and identify patterns in the way we legislate. This involves digitising the entire statute book and presenting the data so that all UK legislation can be analysed, together with publicly available data from legal publishers such as LexisNexis and Westlaw.

‘It’s better to count than to guess,’ observes David Howarth, a legal academic and former MP who co-leads the project. ‘Big Data for Law will provide the most comprehensive record of all UK legislation ever created, together with analytical tools. Although these will be most useful for the public sector, government, researchers and policy-makers, it is also useful to law firms, particularly those with public policy practices who will be able to gain insights into the changing regulatory environment.’

Firms and lawyers will be able to use the Big Data for Law resources to identify patterns and trends in particular areas of legislation. An important consideration is to publish the data in a form that will enable researchers to work with it effectively – and present their findings clearly.

Another part of the project uses big data as a design tool, to identify patterns within statutes – combinations of rules that are used repeatedly to meet policy goals. This is effectively extracting the structure of legislation and thinking about how it can be reused. Howarth suggests that law firms could also apply this approach to legal documents.

John Sheridan, head of legislation services for the National Archives and senior investigator for the Big Data for Law project, highlights the importance of creating tools and methods that are accessible to those without deep statistical knowledge as well as developing pre-packaged analysis that researchers and others can use and cite in documents: ‘For example, we can plot fluctuations in the number of laws made year on year, and in the length of the text.

‘We can discover how modular legislation is in particular areas and frequency of legislative change. We can also examine the language of law which uncovers the topics of the day and reflects major global events and political and economic trends.’

Mapping the statute book identifies commonly recurring themes and language patterns. Sheridan and his team have identified patterns for licensing, prohibition, regulators, tax and so on, with the purpose of enabling interested parties to quickly gain an understanding of a particular legislative development and identify commonly occurring solutions to particular issues.

The ability to search the entirety of UK legislation using time-based parameters and particular words and phrases makes it straightforward to find relevant laws pertaining to a specific issue, which is particularly useful to parties involved in litigation. Sheridan underlines the importance of visual presentation to plot patterns and trends, and the ability to drill down into the detail.

Isn’t that an extraordinary description of the potential for a public access to legislation site?

Patterns in legislation? Policy goals linked to external events? Language analysis?

I didn’t see any of those features, yet, but assume that they will be forthcoming.

Confessions of an Information Materialist

Wednesday, January 14th, 2015

Confessions of an Information Materialist by Aaron Kirschenfeld.

There aren’t many people in the world that can tempt me into reading UCC (Uniform Commercial Code) comments (again) but it appears that Aaron is one of them, at least this time.

Aaron was extolling on the usefulness of categories for organization and organization of information in particular and invokes “Official Comment 4a to UCC 9-102 by the ALI & NCCUSL.” (ALI = American Law Institute, NCCUSL = National Conference of Commissioners on Uniform State Laws. Seriously, that’s really their name.)

I will quote part of it so you can get the flavor of what Aaron is praising:

The classes of goods are mutually exclusive. For example, the same property cannot simultaneously be both equipment and inventory. In borderline cases — a physician’s car or a farmer’s truck that might be either consumer goods or equipment — the principal use to which the property is put is determinative. Goods can fall into different classes at different times. For example, a radio may be inventory in the hands of a dealer and consumer goods in the hands of a consumer. As under former Article 9, goods are “equipment” if they do not fall into another category.

The definition of “consumer goods” follows former Section 9-109. The classification turns on whether the debtor uses or bought the goods for use “primarily for personal, family, or household purposes.”

Goods are inventory if they are leased by a lessor or held by a person for sale or lease. The revised definition of “inventory” makes clear that the term includes goods leased by the debtor to others as well as goods held for lease. (The same result should have obtained under the former definition.) Goods to be furnished or furnished under a service contract, raw materials, and work in process also are inventory. Implicit in the definition is the criterion that the sales or leases are or will be in the ordinary course of business. For example, machinery used in manufacturing is equipment, not inventory, even though it is the policy of the debtor to sell machinery when it becomes obsolete or worn. Inventory also includes goods that are consumed in a business (e.g., fuel used in operations). In general, goods used in a business are equipment if they are fixed assets or have, as identifiable units, a relatively long period of use, but are inventory, even though not held for sale or lease, if they are used up or consumed in a short period of time in producing a product or providing a service.

Aaron’s reaction to this comment:

The UCC comment hits me two ways. First, it shows how inexorably linked law and the organization of information really are. The profession seeks to explain or justify what is what, what belongs to who, how much of it, and so on. The comment also shows how the logical process of categorizing involves deductive, inductive, and analogical reasoning. With the UCC specifically, practice came before formal classification, and seeks, much like a foreign-language textbook, to explain a living thing by reducing it to categories of words and phrases — nouns, verbs and their tenses, and adjectives (really, the meat of descriptive vocabulary), among others. What are goods and the subordinate types of goods? Comment 4a to 9-102 will tell you!

All of what Aaron says about Comment 4a to UCC 9-102 is true, if you grant the UCC the right to put the terms of discussion beyond the pale of being questioned.

Take for example:

The classes of goods are mutually exclusive. For example, the same property cannot simultaneously be both equipment and inventory.

Ontology friends would find nothing remarkable about classes of goods being mutually exclusive. Or with the example of not being both equipment and inventory at the same time.

The catch is that the UCC isn’t defining these terms in a vacuum. These definitions apply to UCC Article 9, which governs rights in secured transactions. Put simply, were a creditor has the legal right to take your car, boat, house, equipment, etc.

By defining these terms, the UCC (actually the state legislature that adopts the UCC), has put these terms, their definitions and their relationships to other statutes, beyond the pale of discussion. They are the fundamental underpinning of any discussion, including discussions of how to modify them.

It is very difficult to lose an argument if you have already defined the terms upon which the argument can be conducted.

Most notions of property and the language used to describe it are deeply embedded in both constitutions and the law, such as the UCC. The question of should “property” mean the same thing to an ordinary citizen and a quasi-immortal corporation doesn’t comes up. And under the terms of the UCC, it is unlikely to ever come up.

We need legal language for a vast number of reasons but we need to realize that the users of legal language have an agenda of their own and that their language can conceal questions that some of us would rather discuss.

U.S. Congressional Documents and Debates (1774-1875)

Tuesday, December 23rd, 2014

U.S. Congressional Documents and Debates (1774-1875) by Barbara Davis and Robert Brammer (law library specialists at the Library of Congress).

A video introduction to the website A Century of Lawmaking For a New Nation.

I know you are probably wondering why I would post on this resource considering that I just posted on finding popular topics for topic maps! 😉

Popularity, beyond social media popularity, is in the eye of the beholder. This sort of material would appeal to anyone who debates the “intent” of the original framers of the constitution, the American Enterprise Institute for example.

Justice Justice Scalia would be another likely consumer of a topic map based on these materials. He advocates what Wikipedia calls “…textualism in statutory interpretation and originalism in constitutional interpretation.”

Put anyone seeking to persuade Justice Scalia of their cause, is another likely consumer for such a topic map. Or prospective law clerks for that matter. Tying this material to Scalia’s opinions and other writings would increase the value of such a map.

The topic mapping theory part would be fun but imaging Scalia solving the problem of other minds and discerning their intent over two hundred (200) years later would require more imagination than I can muster on most days.

Senate Joins House In Publishing Legislative Information In Modern Formats [No More Sneaking?]

Friday, December 19th, 2014

Senate Joins House In Publishing Legislative Information In Modern Formats by Daniel Schuman.

From the post:

There’s big news from today’s Legislative Branch Bulk Data Task Force meeting. The United States Senate announced it would begin publishing text and summary information for Senate legislation, going back to the 113th Congress, in bulk XML. It would join the House of Representatives, which already does this. Both chambers also expect to have bill status information available online in XML format as well, but a little later on in the year.

This move goes a long way to meet the request made by a coalition of transparency organizations, which asked for legislative information be made available online, in bulk, in machine-processable formats. These changes, once implemented, will hopefully put an end to screen scraping and empower users to build impressive tools with authoritative legislative data. A meeting to spec out publication methods will be hosted by the Task Force in late January/early February.

The Senate should be commended for making the leap into the 21st century with respect to providing the American people with crucial legislative information. We will watch closely to see how this is implemented and hope to work with the Senate as it moves forward.

In addition, the Clerk of the House announced significant new information will soon be published online in machine-processable formats. This includes data on nominees, election statistics, and members (such as committee assignments, bioguide IDs, start date, preferred name, etc.) Separately, House Live has been upgraded so that all video is now in H.264 format. The Clerk’s website is also undergoing a redesign.

The Office of Law Revision Counsel, which publishes the US Code, has further upgraded its website to allow pinpoint citations for the US Code. Users can drill down to the subclause level simply by typing the information into their search engine. This is incredibly handy.

This is great news!

Law is a notoriously opaque domain and the process of creating it even more so. Getting the data is a great first step, parsing out steps in the process and their meaning is another. To say nothing of the content of the laws themselves.

Still, progress is progress and always welcome!

Perhaps citizen review will stop the Senate from sneaking changes past sleepy members of the House.

data.parliament @ Accountability Hack 2014

Friday, November 7th, 2014

data.parliament @ Accountability Hack 2014 by Zeid Hadi.

From the post:

We are pleased to announce that data.parliament will be providing data to be used during the Accountability Hack 2014

data.parliament is a platform that enables the sharing of UK Parliament’s data with consumers both within and outside of Parliament. Designed to complement existing data services it aims to be the central publishing platform and data repository for data that is produced by Parliament. Note our release is in Alpha.

It provides both a repository ( for data and a Linked Data API ( The platform’s ‘shop front’ or data catalogue can be found here (

The following datasets and APIs are now available on data.parliament

  • Commons Written Parliamentary Questions and Answers
  • Lords Written Parliamentary Questions and Answers
  • Commons Oral Questions and Question Times
  • Early Day Motions
  • Lords Divisions
  • Commons Divisions
  • Commons Members
  • Lords Members
  • Constituencies
  • Briefing Papers
  • Papers Laid

A description of the APIs and their usage can be found at All the data exposed by the endpoints can be returned in a variety of formats not least JSON.

To get you started the team has coded two publically available demonstrators that make use of the data in data.parliament. The source code for these can found at One of the demonstrators, a client app, can be found working at Also be sure to read our blog ( for quick start guides, updates, and news about upcoming datasets.

The data.parliament team will be on hand at the Hack, both participating and networking through the event to gather feedback and ideas..

I don’t know enough about British parliamentary procedure to comment on the completeness of the interface.

I am quite interested in the Briefing Papers data feed:

This dataset contains the data for research briefings produced by the Libraries of the House of Commons and House of Lords and the Parliamentary Office of Science and Technology. Each briefing has a pdf document for the briefing itself as well as a set of metadata to accompany it. (

A great project but even a complete set of documents and transcripts of every word spoken at Parliament does not document relationships between members of Parliment, their relationships to economic interests, etc.

Looking forward to collation of information from this project with other data to form a clearer picture of the legislative process in the UK.

I first saw this in a tweet by data.parliament UK.

Caselaw is Set Free, What Next? [Expanding navigation/search targets]

Thursday, November 6th, 2014

Caselaw is Set Free, What Next? by Thomas Bruce, Director, Legal Information Institute, Cornell.

Thomas provides a great history of Google Scholar’s caselaw efforts and its impact on the legal profession.

More importantly, at least to me, were his observations on how to go beyond the traditional indexing and linking in legal publications:

A trivial example may help. Right now, a full-text search for “tylenol” in the US Code of Federal Regulations will find… nothing. Mind you, Tylenol is regulated, but it’s regulated as “acetaminophen”. But if we link up the data here in Cornell’s CFR collection with data in the DrugBank pharmaceutical collection , we can automatically determine that the user needs to know about acetaminophen — and we can do that with any name-brand drug in which acetaminophen is a component. By classifying regulations using the same system
that science librarians use to organize papers in agriculture
, we can determine which scientific papers may form the rationale for particular regulations, and link the regulations to the papers that explain the underlying science. These techniques, informed by emerging approaches in natural-language processing and the Semantic Web, hold great promise.

All successful information-seeking processes permit the searcher to exchange something she already knows for something she wants to know. By using technology to vastly expand the number of things that can meaningfully and precisely be submitted for search, we can dramatically improve results for a wide swath of users. In our shop, we refer to this as the process of “getting from barking dog to nuisance”, an in-joke that centers around mapping a problem expressed in real-world terms to a legal concept. Making those mappings on a wide scale is a great challenge. If we had those mappings, we could answer a lot of everyday questions for a lot of people.

(emphasis added)

The first line I bolded in the quote:

All successful information-seeking processes permit the searcher to exchange something she already knows for something she wants to know.

captures the essence of a topic map. Yes? That is a user navigates or queries a topic map on the basis of terms they already know. In so doing, they can find other terms that are interchangeable with theirs, but more importantly, if information is indexed using a different term than theirs, they can still find the information.

In traditional indexing systems, think of the Readers Guide to Periodical Literature, Library of Congress Subject Headings, some users learned those systems in order to become better searchers. Still an interchange of what you know for what you don’t know, but with a large front-end investment.

Thomas is positing a system like topic maps that enables a users to navigate by the terms they know already to find information they don’t know.

The second block of text I bolded:

Making those mappings on a wide scale is a great challenge. If we had those mappings, we could answer a lot of everyday questions for a lot of people.

Making wide scale mappings certainly is a challenge. In part because there are so many mappings to be made and so many different ways to make them. Not to mention that the mappings will evolve over time as usages change.

There is growing realization that indexing or linking data results in a very large pile of indexed or linked data. You can’t really navigate it unless or until you hit upon the correct terms to make the next link. We could try to teach everyone the correct terms but as more correct terms appear everyday, that seems an unlikely solution. Thomas has the right of it when he suggests expanding the target of “correct” terms.

Topic maps are poised to help expand the target of “correct” terms, and to do so in such a way as to combine with other expanded targets of “correct” terms.

I first saw this in a tweet by Aaron Kirschenfeld.

Update: Tarlton Law Libary (University of Texas at Austin) Legal Research Guide has a great page of tips and pointers on the Google Scholar caselaw collection. Bookmark this guide.

Introduction to Basic Legal Citation (online ed. 2014)

Sunday, November 2nd, 2014

Introduction to Basic Legal Citation (online ed. 2014) by Peter W. Martin.

From the post:

This work first appeared in 1993. It was most recently revised in the fall of 2014 following a thorough review of the actual citation practices of judges and lawyers, the relevant rules of appellate practice of federal and state courts, and the latest edition of the ALWD Guide to Legal Citation, released earlier in the year. As has been true of all editions released since 2010, it is indexed to both the ALWD guide and the nineteenth edition of The Bluebook. However, it also documents the many respects in which contemporary legal writing, very often following guidelines set out in court rules, diverges from the citation formats specified by those academic texts.

The content of this guide is also available in three different e-book formats: 1) a pdf version that can be printed out in whole or part and also used with hyperlink navigation on an iPad or other tablet, indeed, on any computer; 2) a version designed specifically for use on the full range of Kindles as well as other readers or apps using the Mobi format; and 3) a version in ePub format for the Nook and other readers or apps that work with it. To access any of them, click here. (Over 50,000 copies of the 2013 edition were downloaded.)

Since the guide is online, its further revision is not tied to a rigid publication cycle. Any user seeing a need for clarification, correction, or other improvement is encouraged to “speak up.” What doesn’t work, isn’t clear, is missing, appears to be in error? Has a change occurred in one of the fifty states that should be reported? Comments of these and other kinds can sent by email addressed to (Please include “Citation” in the subject line.) Many of the features and some of the coverage of this reference are the direct result of past user questions and advice.

A complementary series of video tutorials offers a quick start introduction to citation of the major categories of legal sources. They may also be useful for review. Currently, the following are available:

  1. Citing Judicial Opinions … in Brief (8.5 minutes)
  2. Citing Constitutional and Statutory Provisions … in Brief (14 minutes)
  3. Citing Agency Material … in Brief (12 minutes)

Finally, for those with an interest in current issues of citation practice, policy, and instruction, there is a companion blog, “Citing Legally,” at:

Obviously legal citations are identifiers but Peter helpfully expands on the uses of legal citations:

A reference properly written in “legal citation” strives to do at least three things, within limited space:

  • identify the document and document part to which the writer is referring
  • provide the reader with sufficient information to find the document or document part in the sources the reader has available (which may or may not be the same sources as those used by the writer), and
  • furnish important additional information about the referenced material and its connection to the writer’s argument to assist readers in deciding whether or not to pursue the reference.

I would quibble with Peter’s description of a legal citation “identif[ing] a document or document part,” in part because of his second point, that a reader can find an alternative source for the document.

To me it is easier to say that legal citation identifies a legal decision, legislation or agency decision/rule, which may be reported by any number of sources. Some sources have their own unique references systems that are mapped to other systems. Making a legal decision, legislation or agency decision/rule an abstraction identified by the citation, avoids confusion with a particular source.

A must read for law students, practitioners, judges and potential inventors of the Nth citation system for legal materials.

Guide to Law Online

Tuesday, October 28th, 2014

Guide to Law Online

From the post:

The Guide to Law Online, prepared by the Law Library of Congress Public Services Division, is an annotated guide to sources of information on government and law available online. It includes selected links to useful and reliable sites for legal information.

Select a Link:

The Guide to Law Online is an annotated compendium of Internet links; a portal of Internet sources of interest to legal researchers. Although the Guide is selective, inclusion of a site by no means constitutes endorsement by the Law Library of Congress.

In compiling this list, emphasis wherever possible has been on sites offering the full texts of laws, regulations, and court decisions, along with commentary from lawyers writing primarily for other lawyers. Materials related to law and government that were written by or for lay persons also have been included, as have government sites that provide even quite general information about themselves or their agencies.

Every direct source listed here was successfully tested before being added to the list. Users, however, should be aware that changes of Internet addresses and file names are frequent, and even sites that usually function well do not always do so. Thus a successful connection may sometimes require several attempts. If such an attempt to access a file indicates an error, the information can sometimes still be accessed by truncating the URL address to access a directory at the site.

Last Updated: 07/10/2014

While I was the Library of Congress site today I encountered this set of law guides and thought they might be of interest. Updated in July of this year so most of the links should still work. Officially Out of Beta

Tuesday, October 28th, 2014 Officially Out of Beta

From the post:

The free legislative information website,, is officially out of beta form, and beginning today includes several new features and enhancements. URLs that include will be redirected to The site now includes the following:

New Feature: Resources

  • A new resources section providing an A to Z list of hundreds of links related to Congress
  • An expanded list of “most viewed” bills each day, archived to July 20, 2014

New Feature: House Committee Hearing Videos

  • Live streams of House Committee hearings and meetings, and an accompanying archive to January, 2012

Improvement: Advanced Search

  • Support for 30 new fields, including nominations, Congressional Record and name of member

Improvement: Browse

  • Days in session calendar view
  • Roll Call votes
  • Bill by sponsor/co-sponsor

When the Library of Congress, in collaboration with the U.S. Senate, U.S. House of Representatives and the Government Printing Office (GPO) released as a beta site in the fall of 2012, it included bill status and summary, member profiles and bill text from the two most recent congresses at that time – the 111th and 112th.

Since that time, has expanded with the additions of the Congressional Record, committee reports, direct links from bills to cost estimates from the Congressional Budget Office, legislative process videos, committee profile pages, nominations, historic access reaching back to the 103rd Congress and user accounts enabling saved personal searches. Users have been invited to provide feedback on the site’s functionality, which has been incorporated along with the data updates.

Plans are in place for ongoing enhancements in the coming year, including addition of treaties, House and Senate Executive Communications and the Congressional Record Index.

Field Value Lists:

Use search fields in the main search box (available on most pages), or via the advanced and command line search pages. Use terms or codes from the Field Value Lists with corresponding search fields: Congress [congressId], Action – Words and Phrases [billAction], Subject – Policy Area [billSubject], or Subject (All) [allBillSubjects].

Congresses (44, stops with 70th Congress (1927-1929))

Legislative Subject Terms, Subject Terms (541), Geographic Entities (279), Organizational Names (173). (total 993)

Major Action Codes (98)

Policy Area (33)

Search options:

Search Form: “Choose collections and fields from dropdown menus. Add more rows as needed. Use Major Action Codes and Legislative Subject Terms for more precise results.”

Command Line: “Combine fields with operators. Refine searches with field values: Congresses, Major Action Codes, Policy Areas, and Legislative Subject Terms. To use facets in search results, copy your command line query and paste it into the home page search box.”

Search Tips Overview: “You can search using the quick search available on most pages or via the advanced search page. Advanced search gives you the option of using a guided search form or a command line entry box.” (includes examples)


You can follow this project @congressdotgov.

Orientation to Legal Research & is available both as a seminar (in-person) and webinar (online).


I first saw this at is Out of Beta with New Features by Africa S. Hands.

Free Public Access to Federal Materials on Guide to Law Online [Browsing, No Search]

Thursday, October 16th, 2014

From the post:

From the post:

Through an agreement with the Library of Congress, the publisher William S. Hein & Co., Inc. has generously allowed the Law Library of Congress to offer free online access to historical U.S. legal materials from HeinOnline. These titles are available through the Library’s web portal, Guide to Law Online: U.S. Federal, and include:

I should be happy but then I read:

These collections are browseable. For example, to locate the 1982 version of the Bankruptcy code in Title 11 of the U.S. Code you could select the year (1982) and then Title number (11) to retrieve the material. (emphasis added)

Err, actually it should say: These collections are browseable only. No search within or across the collections.

Here is an example:

sumpreme court default listing

If you expand volume 542 you will see:

supreme court volume 542

Look! There is Intell vs. ADM, let’s look at that one!

Intel vs. ADM download page

Did I just overlook a search box?

I checked the others and you can to.

I did find one that was small enough (less than 20 pages I suppose) to have a search function:

CFR General Provisions image

So, let’s search for something that ought to be in the CFR general provisions, like “department:”

Department in search box

The result?

search error

Actually that is an abbreviation of the error message. Waste of space to show more.

To summarize, the Library of Congress has arranged for all of us to have browseable access but no search to:

  • United States Code 1925-1988 (includes content up to 1993)
    • From Guide to Law Online: United States Law
  • United States Reports v. 1-542 (1754-2004)
    • From Guide to Law Online: United States Judiciary
  • Code of Federal Regulations (1938-1995)
    • From Guide to Law Online: Executive
  • Federal Register v. 1-58 (1936-1993)
    • From Guide to Law Online: Executive

Hundreds of thousands of pages of some of the most complex documents in history and no searching.

If that’s helping us, I don’t think we can afford much more help from the Library of Congress. That’s a hard thing for me to say because in the vast number of cases I really like and support the Library of Congress (aside from the robber baron refugees holed up on the Copyright Office).

Just so I don’t end on a negative note, I have a suggestion to correct this situation:

Give Thompson-Reuters (I knew them as West Publishing Company) or LexisNexis a call. Either one is capable of a better solution than you have with William S. Hein & Co., Inc. Either one has “related” products it could tastefully suggest along with search results.

The Restatement Project

Friday, July 4th, 2014

Rough Consensus, Running Standards: The Restatement Project by Jason Boehmig, Tim Hwang, and Paul Sawaya.

From part 3:

Supported by a grant from the Knight Foundation Prototype Fund, Restatement is a simple, rough-and-ready system which automatically parses legal text into a basic machine-readable JSON format. It has also been released under the permissive terms of the MIT License, to encourage active experimentation and implementation.

The concept is to develop an easily-extensible system which parses through legal text and looks for some common features to render into a standard format. Our general design principle in developing the parser was to begin with only the most simple features common to nearly all legal documents. This includes the parsing of headers, section information, and “blanks” for inputs in legal documents like contracts. As a demonstration of the potential application of Restatement, we’re also designing a viewer that takes documents rendered in the Restatement format and displays them in a simple, beautiful, web-readable version.

I skipped the sections justifying the project because in my circles, the need for text mining is presumed and the interesting questions are about the text and/or the techniques for mining.

As you might suspect, I have my doubts about using JSON for legal texts but for a first cut, let’s hope the project is successful. There is always time to convert to a more robust format at some later point, in response to a particular need.

Definitely a project to watch or assist if you are considering creating a domain specific conversion editor.

What You Thought The Supreme Court…

Sunday, June 15th, 2014

From the post:

From the post:

Supreme Court opinions are the law of the land, and so it’s a problem when the Justices change the words of the decisions without telling anyone. This happens on a regular basis, but fortunately a lawyer in Washington appears to have just found a solution.

The issue, as Adam Liptak explained in the New York Times, is that original statements by the Justices about everything from EPA policy to American Jewish communities, are disappearing from decisions — and being replaced by new language that says something entirely different. As you can imagine, this is a problem for lawyers, scholars, journalists and everyone else who relies on Supreme Court opinions.

Until now, the only way to detect when a decision has been altered is a pain-staking comparison of earlier and later copies — provided, of course, that someone knew a decision had been changed in the first place. Thanks to a simple Twitter tool, the process may become much easier.

See Jeff’s post for more details, including a twitter account to follow the discovery of changes in opinions in the opinions of the Supreme Court of the United States.

In a nutshell, the court issues “slip” opinions in cases they decide and then later, sometimes years later, they provide a small group of publishers of their opinions with changes to be made to those opinions.

Which means the opinion you read as a “slip” opinion or in an advance sheet (paper back issue that is followed by a hard copy volume combining one or more advance sheets), may not be the opinion of record down the road.

Two questions occur to me immediately:

  1. We can distinguish the “slip” opinion version of an opinion from the “final” published opinion, but how do we distinguish a “final” published decision from a later “more final” published decision? Given the stakes at hand in proceedings before the Supreme Court, certainty about the prior opinions of the Court is very important.
  2. While the Supreme Court always gets most of the attention, it occurs to me that the same process of silent correction has been going on for other courts with published opinions, such as the United States Courts of Appeal and the United States District Courts. Perhaps for the last century or more.

    Which makes it only a small step to ask about state supreme courts and their courts of appeal. What is their record on silent correction of opinions?

There are mechanical difficulties the older records become because the “slip” opinions may be lost to history but in terms of volume, that would certainly be a “big data” project for legal informatics. To discover and document the behavior of courts over time with regard to silent correction of opinions.

What you thought the Supreme Court said may not be what our current record reflects. Who wins? What you heard or what a silently corrected record reports?

Annotating, Extracting, and Linking Legal Information

Sunday, April 20th, 2014

Annotating, Extracting, and Linking Legal Information by Adam Wyner. (slides)

Great slides, provided you have enough background in the area to fill in the gaps.

I first saw this at: Wyner: Annotating, Extracting, and Linking Legal Information, which has collected up the links/resources mentioned in the slides.

Despite decades of electronic efforts and several centuries of manual effort before that, legal information retrieval remains an open challenge.

Definitions Extractions from the Code of Federal Regulations

Friday, April 11th, 2014

Definitions Extractions from the Code of Federal Regulations by Mohamma M. AL Asswad, Deepthi Rajagopalan, and Neha Kulkarni. (poster)

From a description of the project:

Imagine you’re opening a new business that uses water in the production cycle. If you want to know what federal regulations apply to you, you might do a Google search that leads to the Code of Federal Regulations. But that’s where it gets complicated, because the law contains hundreds of regulations involving water that are difficult to narrow down. (The CFR alone contains 13898 references to water.) For example, water may be defined one way when referring to a drinkable liquid and another when defined as an emission from a manufacturing facility. If the regulation says your water must maintain a certain level of purity, to which water are they referring? Definitions are the building blocks of the law, and yet pouring through them to find what applies to you is frustrating to an average business owner. Computer automation might help, but how can a computer understand exactly what kind of water you’re looking for? We at the Legal Information Institute think this is pretty important challenge, and apparently Google does too.

Looking forward to learning more about this project!

BTW, this is the same Code of Federal Regulations that some members of Congress don’t think needs to be indexed.

Knowing what legal definitions apply is a big step towards making legal material more accessible.

Placement of Citations [Discontinuity and Users]

Friday, April 11th, 2014

If the Judge Will Be Reading My Brief on a Screen, Where Should I Place My Citations? by Peter W. Martin.

From the post:

From the post:

brief page

Implicitly, Garner’s position assumes a printed page, with footnote calls embedded in the text and the related notes placed at the bottom. In print that entirety is visible at once. The eyes must move, but both call and footnote remain within a single field of vision. Secondly, when the citation sits inert on a printed page and the cited source is online, the decision to inspect that source and when to do so is inevitably influenced by the significant discontinuity that transaction will entail. In print, citation placement contributes little to that discontinuity. The situation is altered – significantly, it seems to me – when a brief or memorandum is submitted electronically and will most likely be read from a screen. In 2014 that is the case with a great deal of litigation.

This is NOT a discussion of interest only to lawyers and judges.

While Peter has framed the issue in terms of contrasting styles of citation, as he also points out, there is a question of “discontinuity” and I would argue comprehension for the reader in these styles.

At first blush, being a regular hypertext maven you may think that inline citations are “the way to go,” on this citation issue.

To some degree I would agree with you but leaving the current display to consult a citation or other material that could appear in a footnote, introduces another form of discontinuity.

You are no longer reading a brief prepared by someone familiar with the law and facts at hand but someone who is relying on different facts and perhaps even a different legal context for their statements.

If you are a regular reader of hypertexts, try writing down the opinion of one author on a note card, follow a hyperlink in that post to another resource, record the second author’s opinion on the same subject on a second note card and then follow a link from the second resource to a third and repeat the note card opinion recording. Set all three cards aside, with no marks to associate them with a particular author.

After two (2) days return to the cards and see if you can distinguish the card you made for the first author from the next two.

Yes, after a very short while you are unable to identify the exact source of information that you were trying to remember. Now imagine that in a legal context where facts and/or law are in dispute. Exactly how much “other” content do you want to display with your inline reference?

The same issue comes up for topic map interfaces. Do you really want to display all the information on a subject or do you want to present the user with a quick overview and enable them to choose greater depth?

Personally I would use citations with pop-ups that contain a summary of the cited authority, with a link to the fuller resource. So a judge could quickly confirm their understanding of a case without waiting for resources to load, etc.

But in any event, how much visual or cognitive discontinuity your interface is inflicting on users is an important issue.

Cataloguing projects

Tuesday, March 11th, 2014

Cataloguing projects (UK National Archive)

From the webpage:

The National Archives’ Cataloguing Strategy

The overall objective of our cataloguing work is to deliver more comprehensive and searchable catalogues, thus improving access to public records. To make online searches work well we need to provide adequate data and prioritise cataloguing work that tackles less adequate descriptions. For example, we regard ranges of abbreviated names or file numbers as inadequate.

I was lead to this delightful resource by a tweet from David Underdown, advising that his presentation from National Catalogue Day in 2013 was now onlne.

His presentation along with several others and reports about projects in prior years are available at this projects page.

I thought the presentation titled: Opening up of Litigation: 1385-1875 by Amanda Bevan and David Foster, was quite interesting in light of various projects that want to create new “public” citation systems for law and litigation.

I haven’t seen such a proposal yet that gives sufficient consideration to the enormity of what do you do with old legal materials?

The litigation presentation could be a poster child for topic maps.

I am looking forward to reading the other presentations as well.

Legislative XML Data Mapping Results

Saturday, March 1st, 2014

Legislative XML Data Mapping Results

You may recall last September (2013) when I posted: Legislative XML Data Mapping [$10K], which was a challenge to convert documents encoded in U.S. Congress and U.K. Parliament markup into Akoma Ntoso.

There were five (5) entries and two (2) winners.

The first place winner reports:

The included web application, an instance of which is running at, converts documents to Akoma Ntoso in response to common HTTP requests. Visit the app with a web browser, enter the URL of the source XML into the form, and the app responds with an Akoma Ntoso representation of the source document. Requests can even be made without a browser by passing the source document’s URL directly as the “source” parameter, e.g.,

But I was unable to find the files with the includes .xsl transforms.

The second place winner reports the use of Perl scripts that can be found at:

I was unable to find any formal comparison of the entries. Perhaps you will have better luck.

And I am curious, if you encountered a “converted” form of a U.S. or U.K. statute, would you be able to faithfully reconstruct the original?

Making the meaning of contracts visible…

Sunday, February 23rd, 2014

Making the meaning of contracts visible – Automating contract visualization by Stefania Passera, Helena Haapio, Michael Curtotti.


The paper, co-authored by Passera, Haapio and Curtotti, presents three demos of tools to automatically generate visualizations of selected contract clauses. Our early prototypes include common types of term and termination, payment and liquidated damages clauses. These examples provide proof-of-concept demonstration tools that help contract writers present content in a way readers pay attention to and understand. These results point to the possibility of document assembly engines compiling an entirely new genre of contracts, more user-friendly and transparent for readers and not too challenging to produce for lawyers.



From slides 2 and 3:

Need for information to be accessible, transparent, clear and easy to understand
   Contracts are no exception.

Benefits of visualization

  • Information encoded explicitly is easier to grasp & share
  • Integrating pictures & text prevents cognitive overload by distributing effort on 2 different processing systems
  • Visual structures and cues act as paralanguage, reducing the possibility of misinterpretation

Sounds like the output from a topic map doesn’t it?

A contract is “explicit and transparent” to a lawyer, but that doesn’t mean everyone reading it sees the contract as “explicit and transparent.”

Making what the lawyer “sees” explicit, in other words, is another identification of the same subject, just a different way to describe it.

What’s refreshing is the recognition that not everyone understands the same description, hence the need for alternative descriptions.

Some additional leads to explore on these authors:

Stefania Passera Homepage with pointers to her work.

Helena Haapio Profile at Lexpert, pointers to her work.

Michael Curtotti – Computational Tools for Reading and Writing Law.

There is a growing interest in making the law transparent to non-lawyers, which is going to require a lot more than “this is the equivalent of that, because I say so.” Particularly for re-use of prior mappings.

Looks like a rapid growth area for topic maps to me.


I first saw this at: Passera, Haapio and Curtotti: Making the meaning of contracts visible – Automating contract visualization.

Identifying Case Law

Wednesday, January 29th, 2014

Costs of the (Increasingly) Lengthy Path to U.S. Report Pagination by Peter W. Martin.

If you are not familiar with the U.S. Supreme Court, the thumbnail sketch is that the court publishes its opinions without official page numbers and they remain that way for years. When the final printed version appears, all the cases citing a case without official page numbers, have to be updated. Oh joy! 😉

Peter does a great job illustrating the costs of this approach.

From the post:

On May 17, 2010, the U.S. Supreme Court decided United States v. Comstock, holding that Congress had power under the Necessary and Proper Clause of the U.S. Constitution to authorize civil commitment of a mentally ill, sexually dangerous federal prisoner beyond his release date. (18 U.S.C. § 4248). Three and a half years later, the Court communicated the Comstock decision’s citation pagination with the shipment of the “preliminary print” of Part 1 of volume 560 of the United States Reports. That paperbound publication was logged into the Cornell Law Library on January 3 of this year. (According to the Court’s web site the final bound volume shouldn’t be expected for another year.) United States v. Comstock, appears in that volume at page 126, allowing the full case finally to be cited: United States v. Comstock, 560 U.S. 126 (2010) and specific portions of the majority, concurring and dissenting opinions to be cited by means of official page numbers.

This lag between opinion release and attachment of official volume and page numbers along the slow march to a final bound volume has grown in recent years, most likely as a result of tighter budgets at the Court and the Government Printing Office. Less than two years separated the end of the Court’s term in 2001 and our library’s receipt of the bound volume containing its last decisions. By 2006, five years later, the gap had widened to a full three years. Volume 554 containing the last decisions from the term ending in 2008 didn’t arrive until July 9 of last year. That amounts to nearly five years of delay.

If the printed volumes of the Court’s decisions served solely an archival function, this increasingly tardy path to print would warrant little concern or comment. But because the Court provides no means other than volume and page numbers to cite its decisions and their constituent parts, the increasing delays cast a widening ripple of costs on the federal judiciary, the services that distribute case law, and the many who need to cite it.

The nature of those costs can be illustrated using the Comstock case itself.

In addition to detailing the costs of delayed formal citation, Peter’s analysis is equally applicable to multiple gene names, for example, that precede any attempt at an official name.

What happens to all the literature that was published using the “interim” names?

Yes, we can map between them or create synonym tables, but who knows on what basis we created those tables or mappings?

Legal citations aren’t changing rapidly but the fact they are changing at all is fairly remarkable. Taken as lessons in the management of identifiers, it is a area to watch closely.

Access to State Supreme Court Data

Tuesday, January 14th, 2014

Public access to the states’ highest courts: a report card

The post focuses on the Virginia Supreme Court, not surprisingly since it is the Open Virginia Law project.

But it also mentions Public Access to the States’ Highest Courts: A Report Card (PDF), which is a great summary of public access to state (United States) supreme court data. With hyperlinks to relevant resources.

The report card will definitely be of interest to law students, researchers, librarians, lawyers and even members of the public.

In addition to being a quick synopsis for public policy discussions, it makes a great hand list of state court resources.

An earlier blog post pointed out that the Virginia Supreme Court is now posting audio recordings of oral arguments.

Could be test data for speech recognition and other NLP tasks or used if you are simply short of white noise. 😉

Is Link Rot Destroying Stare Decisis…

Monday, December 30th, 2013

Is Link Rot Destroying Stare Decisis as We Know It? The Internet-Citation Practice of the Texas Appellate Courts by Arturo Torres (Journal of Appellate Practice and Process, Vol 13, No. 2, Fall 2012 )


In 1995 the first Internet-based citation was used in a federal court opinion. In 1996, a state appellate court followed suit; one month later, a member of the United States Supreme Court cited to the Internet; finally, in 1998 a Texas appellate court cited to the Internet in one of its opinions. In less than twenty years, it has become common to find appellate courts citing to Internet-based resources in opinions. Because of the current extent of Internet-citation practice varies by courts across jurisdictions, this paper will examine the Internet-citation practice of the Texas Appellate courts since 1998. Specifically, this study surveys the 1998 to 2011 published opinions of the Texas appellate courts and describes their Internet-citation practice.

A study that confirms what was found in …Link and Reference Rot in Legal Citations for the Harvard Law Review and the U.S. Supreme Court.

Curious that a West Key Numbers remain viable after more than a century of use (manual or electronic resolution) whereas Internet citations expire over the course of a few years.

What do you think is the difference in those citations, West Key Numbers versus URLs, that accounts for one being viable and the other only ephemerally so?

…if not incomprehensible to most citizens

Saturday, December 28th, 2013

A Right to Access Implies A Right to Know: An Open Online Platform for Research on the Readability of Law by Michael Curtotti and Eric McCreath. (Journal of Open Access to Law, Vol. 1, No. 1)


The widespread availability of legal materials online has opened the law to a new and greatly expanded readership. These new readers need the law to be readable by them when they encounter it. However, the available empirical research supports a conclusion that legislation is difficult to read if not incomprehensible to most citizens. We review approaches that have been used to measure the readability of text including readability metrics, cloze testing and application of machine learning. We report the creation and testing of an open online platform for readability research. This platform is made available to researchers interested in undertaking research on the readability of legal materials. To demonstrate the capabilities of the platform, we report its initial application to a corpus of legislation. Linguistic characteristics are extracted using the platform and then used as input features for machine learning using the Weka package. Wide divergences are found between sentences in a corpus of legislation and those in a corpus of graded reading material or in the Brown corpus (a balanced corpus of English written genres). Readability metrics are found to be of little value in classifying sentences by grade reading level (noting that such metrics were not designed to be used with isolated sentences).

What I found troubling about this paper as its conjuring of a right to have the law (the text of the law) to be “reasonably accessible” to individuals:

Leaving aside the theoretical justifications that might be advanced to support this view, the axiomatic position taken by this paper is that all individuals subject to law are entitled to know its content and therefore to have it written in a way which is reasonably accessible to them. (pp. 6-7)

I don’t dispute that the law should be freely available to everyone, it is difficult to obey what isn’t at least potentially available.

But, the authors’ “reasonably accessible” argument fails in two ways.

First, the authors fail to define a level of readability that supports “reasonably accessible.” How much change is necessary to achieve “reasonably accessible?” At least the authors don’t know.

Second, the amount of necessary change must be known in order to judge the feasibility of any revisions to make the law “reasonably accessible.”

The U.S. Internal Revenue Code (herein IRC) is a complex body of work that is based on prior court decisions, rulings by the I.R.S. and a commonly understood vocabulary among tax experts. And it is legislation that touches many other laws and regulations, both at a federal and state level. All of which are interwoven with complex meanings established by years of law, regulation and custom.

Even creating a vulgar version of important legislation would depend upon identification of a complex of subjects and relationships that are explicit only to an expert reader. Doable, but it would never have the force of law.

I first saw this at: Curtotti and McCreath: An Open Online Platform for Research on the Readability of Law.


Tuesday, December 17th, 2013

2013 End-of Year List of People Who Make a Difference in eDiscovery by Gerard. J. Britton.

Gerald has created a list of six (6) people who made a difference in ediscovery in 2013.

If ediscovery is unfamiliar, you have all of the issues of data/big data with an additional layer of legal rules and requirements.

Typically seen in litigation with high stakes.

A fruitful area for the application of semantic integration technologies, topic maps in particular.

Scout [NLP, Move up from Twitter Feeds to Court Opinions]

Tuesday, December 3rd, 2013


From the about page:

Scout is a free service that provides daily insight to how our laws and regulations are shaped in Washington, DC and our state capitols.

These days, you can receive electronic alerts to know when a company is in the news, when a TV show is scheduled to air or when a sports team wins. Now, you can also be alerted when our elected officials take action on an issue you care about.

Scout allows anyone to subscribe to customized email or text alerts on what Congress is doing around an issue or a specific bill, as well as bills in the state legislature and federal regulations. You can also add external RSS feeds to complement a Scout subscription, such as press releases from a member of Congress or an issue-based blog.

Anyone can create a collection of Scout alerts around a topic, for personal organization or to make it easy for others to easily follow a whole topic at once.

Researchers can use Scout to see when Congress talks about an issue over time. Members of the media can use Scout to track when legislation important to their beat moves ahead in Congress or in state houses. Non-profits can use Scout as a tool to keep tabs on how federal and state lawmakers are making policy around a specific issue.

Early testing of Scout during its open beta phase alerted Sunlight and allies in time to successfully stop an overly broad exemption to the Freedom of Information Act from being applied to legislation that was moving quickly in Congress. Read more about that here.

Thank you to the Stanton Foundation, who contributed generous support to Scout’s development.

What kind of alerts?

If your manager suggests a Twitter feed to test NLP, classification, sentiment, etc. code, ask to use Federal Court (U.S.) Court Opinion Feed instead.

Not all data is written in one hundred and forty (140) character chunks. 😉

PS: Be sure to support/promote the Sunlight Foundation for making this data available.

A Case Study on Legal Case Annotation

Friday, October 18th, 2013

A Case Study on Legal Case Annotation by Adam Wyner, Wim Peters, and Daniel Katz.


The paper reports the outcomes of a study with law school students to annotate a corpus of legal cases for a variety of annotation types, e.g. citation indices, legal facts, rationale, judgement, cause of action, and others. An online tool is used by a group of annotators that results in an annotated corpus. Differences amongst the annotations are curated, producing a gold standard corpus of annotated texts. The annotations can be extracted with semantic searches of complex queries. There would be many such uses for the development and analysis of such a corpus for both legal education and legal research.

author = {Adam Wyner and Peters, Wim, and Daniel Katz},
title = {A Case Study on Legal Case Annotation},
booktitle = {Proceedings of 26th International Conference on Legal Knowledge and Information Systems (JURIX 2013)},
year = {2013},
pages = {??-??},
address = {Amsterdam},
publisher = {IOS Press}

The methodology and results of this study will be released as open source resources.

A gold standard for annotation of legal texts will create the potential for automated tools to assist lawyers, judges and possibly even lay people.

Deeply interested to see where this project goes next.

…Link and Reference Rot in Legal Citations

Tuesday, September 24th, 2013

Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations by Jonathan Zittrain, Kendra Albert, Lawrence Lessig.


We document a serious problem of reference rot: more than 70% of the URLs within the Harvard Law Review and other journals, and 50% of the URLs found within U.S. Supreme Court opinions do not link to the originally cited information.

Given that, we propose a solution for authors and editors of new scholarship that involves libraries undertaking the distributed, long-term preservation of link contents.

Imagine trying to use a phone book where 70% of the addresses were wrong.

Or you are looking for your property deed and learn that only 50% of the references are correct.

Do those sound like acceptable situations?

Considering the Harvard Law Review and the U.S. Supreme Court put a good deal of effort into correct citations, the fate of the rest of the web must be far worse.

The about page for Perma reports:

Any author can go to the website and input a URL. downloads the material at that URL and gives back a new URL (a “ link”) that can then be inserted in a paper.

After the paper has been submitted to a journal, the journal staff checks that the provided link actually represents the cited material. If it does, the staff “vests” the link and it is forever preserved. Links that are not “vested” will be preserved for two years, at which point the author will have the option to renew the link for another two years.

Readers who encounter links can click on them like ordinary URLs. This takes them to the site where they are presented with a page that has links both to the original web source (along with some information, including the date of the link’s creation) and to the archived version stored by

I would caution that “forever” is a very long time.

What happens to the binding between an identifier and a URL when URLs are replaced by another network protocol?

After all the change over the history of the Internet, you don’t believe the current protocols will last “forever” Yes?

A more robust solution would divorce identifiers/citations from any particular network protocol, whether you think it will last forever or not.

That separation of identifier from network protocol preserves the possibility of an online database such as but also databases that have local caches of the citations and associated content, databases that point to multiple locations for associated content, and databases that support currently unknown protocols to access content associated with an identifier.

Just as a database of citations from Codex Justinianus could point to the latest printed Latin text, online versions or future versions.

Citations can become permanent identifiers if they don’t rely on a particular network addressing systems.

Court Listener

Tuesday, September 24th, 2013

Court Listener

From the about page:

Started as a part-time hobby in 2010, CourtListener is now a core project of the Free Law Project, a California Non-Profit corporation. The goal of the site is to provide powerful free legal tools for everybody while giving away all our data in bulk downloads.

We collect legal opinions from court websites and from data donations, and are aiming to have the best, most complete data on the open Web within the next couple years. We are slowly expanding to provide search and awareness tools for as many state courts as possible, and we already have tools for all of the Federal Appeals Courts. For more details on which jurisdictions we support, see our coverage page. If you’re able to help us acquire more cases, please get in touch.

This rather remarkable site has collected 905,842 court opinions as of September 24, 2013.

The default listing of cases is newest first but you can choose oldest first, most/least cited first and keyword relevance. Changing the listing order becomes interesting once you perform a keyword search (top search bar). The refinement (left hand side) works quite well, except that I could not filter search results by a judges name. On case names, separate the parties with “v.” as “vs” doesn’t work.

It is also possible to discover examples of changing legal terminology that impact your search results.

For example, try searching for the keyword phrase, “interstate commerce.” Now choose “Oldest first.” you will see Price v. Ralston (1790) and the next case is Crandall v. State of Nevada (1868). Hmmm, what happened to the early interstate commerce cases under John Marshall?

OK, so try “commerce.” Now set to “Oldest first.” Hmmm, a lot more cases. Yes? Under case name, type in “Gibbons” and press return. Now the top case is Gibbons v. Ogden (1824). The case name is a hyperlink so follow that now.

It is a long opinion by Chief Justice Marshall but at paragraph 5 he announces:

The power to regulate commerce extends to every species of commercial intercourse between the United States and foreign nations, and among the several States. It does not stop at the external boundary of a State.

The phrase “among the several States,” occurs 21 times in Gibbons v. Ogden, with no mention of the modern “interstate commerce.”

What we now call the “interstate commerce clause” played a major role in the New Deal legislation that ended the 1930’s depression in the United States. See Commerce Clause. Following the cases cited under “New Deal” will give you an interesting view of the conflicting sides. A conflict that still rages today.

The terminology problem, “among the several states” vs. “interstate commerce” is one that makes me doubt the efficacy of public access to law programs. Short of knowing the “right” search words, it is unlikely you would have found Gibbons v. Ogden. Well, short of reading through the entire corpus of Supreme Court decisions. 😉

Public access to law would be enhanced with mappings such as “interstate commerce,” and “among the several states,” but also distinguishing “due process,” didn’t always mean what it means today, and further mappings to colloquial search expressions.

A topic map could capture those nuances and many more.

I guess the question is whether people should be free to search for the law or should they be freed by finding the law?

Legislative XML Data Mapping [$10K]

Friday, September 13th, 2013

Legislative XML Data Mapping (Library of Congress)

First, the important stuff:

First Place: $10K

Entry due by: December 31 at 5:00pm EST

Second, the details:

The Library of Congress is sponsoring two legislative data challenges to advance the development of international data exchange standards for legislative data. These challenges are an initiative to encourage broad participation in the development and application of legislative data standards and to engage new communities in the use of legislative data. Goals of this initiative include:
• Enabling wider accessibility and more efficient exchange of the legislative data of the United States Congress and the United Kingdom Parliament,
• Encouraging the development of open standards that facilitate better integration, analysis, and interpretation of legislative data,
• Fostering the use of open source licensing for implementing legislative data standard.

The Legislative XML Data Mapping Challenge invites competitors to produce a data map for US bill XML and the most recent Akoma Ntoso schema and UK bill XML and the most recent Akoma Ntoso schema. Gaps or issues identified through this challenge will help to shape the evolving Akoma Ntoso international standard.

The winning solution will win $10,000 in cash, as well as opportunities for promotion, exposure, and recognition by the Library of Congress. For more information about prizes please see the Official Rules.

Can you guess what tool or technique I would suggest that you use? 😉

The winner is announced February 12, 2014 at 5:00pm EST.

Too late for the holidays this year, too close to Valentines Day, what holiday will you be wanting to celebrate?