Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 13, 2013

Legislative XML Data Mapping [$10K]

Filed under: Challenges,Contest,Law - Sources,Legal Informatics — Patrick Durusau @ 6:21 pm

Legislative XML Data Mapping (Library of Congress)

First, the important stuff:

First Place: $10K

Entry due by: December 31 at 5:00pm EST

Second, the details:

The Library of Congress is sponsoring two legislative data challenges to advance the development of international data exchange standards for legislative data. These challenges are an initiative to encourage broad participation in the development and application of legislative data standards and to engage new communities in the use of legislative data. Goals of this initiative include:
• Enabling wider accessibility and more efficient exchange of the legislative data of the United States Congress and the United Kingdom Parliament,
• Encouraging the development of open standards that facilitate better integration, analysis, and interpretation of legislative data,
• Fostering the use of open source licensing for implementing legislative data standard.

The Legislative XML Data Mapping Challenge invites competitors to produce a data map for US bill XML and the most recent Akoma Ntoso schema and UK bill XML and the most recent Akoma Ntoso schema. Gaps or issues identified through this challenge will help to shape the evolving Akoma Ntoso international standard.

The winning solution will win $10,000 in cash, as well as opportunities for promotion, exposure, and recognition by the Library of Congress. For more information about prizes please see the Official Rules.

Can you guess what tool or technique I would suggest that you use? 😉

The winner is announced February 12, 2014 at 5:00pm EST.

Too late for the holidays this year, too close to Valentines Day, what holiday will you be wanting to celebrate?

September 11, 2013

Input Requested: Survey on Legislative XML

Filed under: Law - Sources,Legal Informatics,Semantics — Patrick Durusau @ 5:15 pm

Input Requested: Survey on Legislative XML

A request for survey participants who are familiar with XML and law. To comment on the Crown Legislative Markup Language (CLML) which is used for the content at: legislation.gov.uk.

Background:

By way of background, the Crown Legislation Mark-up Language (CLML) is used to represent UK legislation in XML. It’s the base format for all legislation published on the legislation.gov.uk website. We make both the schema and all our data freely available for anyone to use, or re-use, under the UK government’s Open Government Licence. CLML is currently expressed as a W3C XML Schema which is owned and maintained by The National Archives. A version of the schema can be accessed online at http://www.legislation.gov.uk/schema/legislation.xsd . Legislation as CLML XML can be accessed from the website using the legislation.gov.uk API. Simply add “/data.xml” to any legislation content page, e.g. http://www.legislation.gov.uk/ukpga/2010/1/data.xml .

Why is this important for topic maps?

Would you believe that the markup semantics of CLML are different from the semantics of United States Legislative Markup (USLM)?

That’s just markup syntax differences. Hard to say what substantive semantic variations are in the laws themselves.

Mapping legal semantics becomes important when the United States claims extraterritorial jurisdiction for the application of its laws.

Or when the United States uses its finance laws to inflict harm on others. (Treasury’s war: the unleashing of a new era of financial warfare by Juan Carlos Zarate.)

Mapping legal semantics won’t make U.S. claims any less extreme but may help convince others of a clear and present danger.

August 9, 2013

Counting Citations in U.S. Law

Filed under: Graphics,Law,Law - Sources,Visualization — Patrick Durusau @ 3:17 pm

Counting Citations in U.S. Law by Gary Sieling.

From the post:

The U.S. Congress recently released a series of XML documents containing U.S. Laws. The structure of these documents allow us to find which sections of the law are most commonly cited. Examining which citations occur most frequently allows us to see what Congress has spent the most time thinking about.

Citations occur for many reasons: a justification for addition or omission in subsequent laws, clarifications, or amendments, or repeals. As we might expect, the most commonly cited sections involve the IRS (Income Taxes, specifically), Social Security, and Military Procurement.

To arrive at this result, we must first see how U.S. Code is laid out. The laws are divided into a hierarchy of units, which allows anything from an entire title to individual sentences to cited. These sections have an ID and an identifier – “identifier” is used an an citation reference within the XML documents, and has a different form from the citations used by the legal community, comes in a form like “25 USC Chapter 21 § 1901″.

If you are interested in some moderate XML data processing, this is the project for you!

Gary has posted the code for developing a citation index to the U.S. Laws in XML.

If you want to skip to one great result of this effort, see: Visualizing Citations in U.S. Law, also by Gary, which is based on d3.js and Uber Data visualization.

In the “Visualizing” post Gary enables the reader to see what laws (by title) cite other titles in U.S. law.

More interesting that you would think.

Take Title 26, Internal Revenue Code (IRC).

Among others, the IRC does not cite:

Title 30 – MINERAL LANDS AND MINING
Title 31 – MONEY AND FINANCE
Title 32 – NATIONAL GUARD

I can understand not citing the NATIONAL GUARD but MONEY AND FINANCE?

Looking forward to more ways to explore the U.S. Laws.

Tying legislative history of laws to say New York Times stories on the subject matter of a law could prove to be very interesting.

I started to suggest tracking donations to particular sponsors and then to legislation that benefits the donors.

But that level of detail is just a distraction. Most elected officials have no shame at selling their offices. Documenting their behavior may regularize pricing of senators and representatives but not have much other impact.

I suggest you find a button other than truth to influence their actions.

August 3, 2013

Examining Citations in Federal Law using Python

Filed under: Government,Law - Sources,Python,Topic Maps — Patrick Durusau @ 4:03 pm

Examining Citations in Federal Law using Python by Gary Sieling.

From the post:

Congress frequently passes laws which amend or repeal sections of prior laws; this produces a series of edits to law which programmers will recognize as bearing resemblance to source control history.

In concept this is simple, but in practice this is incredibly complex – for instance like source control, the system must handle renumbering. What we will see below is that while it is possible to get some data about links, it is difficult to resolve what those links point to.

Here is an example paragraph where, rather than amending a law, the citation serves as a justification for why several words are absent in one section:

(…)

There has been some discussion lately about good examples of using topic maps with particular data sets.

Curious how you would solve the problem posed here using a topic map?

For extra credit, how would you map from particular provisions in a bill to the person(s) most likely to benefit from them?

July 31, 2013

U.S. Code Available in Bulk XML

Filed under: Government,Law,Law - Sources — Patrick Durusau @ 4:24 pm

House of Representatives Makes U.S. Code Available in Bulk XML.

From the press release:

As part of an ongoing effort to make Congress more open and transparent, House Speaker John Boehner (R-OH) and Majority Leader Eric Cantor (R-VA) today announced that the House of Representatives is making the United States Code available for download in XML format.

The data is compiled, updated, and published by the Office of Law Revision Counsel (OLRC). You can download individual titles – or the full code in bulk – and read supporting documentation here.

“Providing free and open access to the U.S. Code in XML is another win for open government,” said Speaker Boehner and Leader Cantor. “And we want to thank the Office of Law Revision Counsel for all of their work to make this project a reality. Whether it’s our ‘read the bill’ reforms, streaming debates and committee hearings live online, or providing unprecedented access to legislative data, we’re keeping our pledge to make Congress more transparent and accountable to the people we serve.”

In 2011, Speaker Boehner and Leader Cantor called for the adoption of new electronic data standards to make legislative information more open and accessible. With those standards in place, the House created the Legislative Branch Bulk Data Task Force in 2012 to expedite the process of providing bulk access to legislative information and to increase transparency for the American people.

Since then, the Government Printing Office (GPO) has begun providing bulk access to House legislation in XML. The Office of the Clerk makes full sessions of House floor summaries available in bulk as well.

The XML version of the U.S. Code will be updated quickly, on an ongoing basis, as new laws are enacted.

You can see a full list of open government projects underway in the House at speaker.gov/open.

While applauding Congress, don’t forget Legal Information Institute at Cornell University Law School has been working on free access to public law for the past twenty-one (21) years.

I first saw this at: U.S. House of Representatives Makes U.S. Code Available in Bulk XML.

July 13, 2013

ggplot2 Choropleth of Supreme Court Decisions: A Tutorial

Filed under: Ggplot2,Law,Law - Sources — Patrick Durusau @ 1:34 pm

ggplot2 Choropleth of Supreme Court Decisions: A Tutorial

From the post:

I don't do much GIS but I like to. It's rather enjoyable and involves a tremendous skill set. Often you will find your self grabbing data sets from some site, scraping, data cleaning and reshaping, and graphing. On the ride home from work yesterday I heard an NPR talk about the Supreme Court decisions being very close with this court. This got me wondering if there is a data base with this information and the journey began. This tutorial is purely exploratory but you will learn to:

  1. Grab .zip files from a data base and read into R
  2. Clean data
  3. Reshape data with reshape2
  4. Merge data sets
  5. Plot a choropleth map in ggplot2
  6. Arrange several grid plots with gridExtra

I'm lazy and like a good challenge. I challenged myself to not manually open a file so I downloaded Biobase from bioconductor to open the pdf files for the codebook. Also I used my own package qdap because it had some functions I like and I'm used to using them. This blog post was created in the dev. version of the reports package using the wordpress_rmd template.

Good R practice and an interesting view of Supreme Court cases.

June 26, 2013

Developing an Ontology of Legal Research

Filed under: Law,Law - Sources — Patrick Durusau @ 3:03 pm

Developing an Ontology of Legal Research by Amy Taylor.

From the post:

This session will describe my efforts to develop a legal ontology for teaching legal research. There are currently more than twenty legal ontologies worldwide that encompass legal knowledge, legal problem solving, legal drafting and information retrieval, and subjects such as IP, but no ontology of legal research. A legal research ontology could be useful because the transition from print to digital sources has shifted the way research is conducted and taught. Legal print sources have much of the structure of legal knowledge built into them (see the attached slide comparing screen shots from Westlaw and WestlawNext), so teaching students how to research in print also helps them learn the subject they are researching. With the shift to digital sources, this structure is now only implicit, and researchers must rely more upon a solid foundation in the structure of legal knowledge. The session will also describe my choice of OWL as the language that best meets the needs in building this ontology. The session will also explore the possibilities of representing this legal ontology in a more compact visual form to make it easier to incorporate into legal research instruction.

Plus slides and:

Leaving aside Amy’s choice of an ontology, OWL, etc., I would like to focus on her statement:

(…)
Legal print sources have much of the structure of legal knowledge built into them (see the attached slide comparing screen shots from Westlaw and WestlawNext), so teaching students how to research in print also helps them learn the subject they are researching. With the shift to digital sources, this structure is now only implicit, and researchers must rely more upon a solid foundation in the structure of legal knowledge.
(…)

First, Ann is comparing “Westlaw Classic,” and “WestlawNext,” both digital editions.

Second, the “structure” in question appeared in the “digests” published by West, for example:

digest

And in case head notes as:

head notes

That is the tradition of reporting structure in the digest and only isolated topics in case reports did not start with electronic versions.

That has been the organization of West materials since its beginning in the 19th century.

Third, an “ontology” of the law is quite a different undertaking from the “taxonomy” used by the West system.

The West American Digest System organized law reports to enable researchers to get “close enough” to relevant authorities.

That is the “last semantic mile” was up to the researcher, not the West system.

Even at that degree of coarseness in the West system, it was still an ongoing labor of decades by thousands of editors, and it remains so until today.

The amount of effort expended to obtain a coarse but useful taxonomy of the law should be a fair warning to anyone attempting an “ontology” of the same.

June 8, 2013

Bradley Manning Trial Transcript (Funding Request)

Filed under: Law - Sources,Legal Informatics,Security — Patrick Durusau @ 1:22 pm

No, not for me.

Funding to support the creation of a public transcript of Bradley Manning’s trial.

Brian Merchant reports in: The Only Public Transcript of the Bradley Manning Trial Will Be Tapped Out on a Crowd-Funded Typewriter:

The Bradley Manning trial began this week, and it is being held largely in secret—according to the Freedom of the Press Foundation, 270 of the 350 media organizations that applied for access were denied. Major outlets like Reuters, the AP, and the Guardian, were forced to sign a document stating they would withhold certain information in exchange for the privilege of attending.

Oh, and no video or audio recorders allowed. And no official transcripts will be made available to anyone.  

But, the court evidently couldn't find grounds to boot out FPF's crowd-funded stenographers, who will be providing the only publicly available transcripts of the trial. (You can donate to the effort and read the transcripts here.)

Which is good news for journalists and anyone looking for an accurate—and public—record of the trial. But the fact that a volunteer stenographer is providing the only comprehensive source of information about such a monumental event is pretty absurd. 

The disclaimer that precedes each transcript epitomizes said absurdity. It reads: "This transcript was made by a court reporter who … was not permitted to be in the actual courtroom where the proceedings took place, but in a media room listening to and watching live audio/video feed, not permitted to make an audio backup recording for editing purposes, and not having the ability to control the proceedings in order to produce an accurate verbatim transcript."

In other words, it's a lone court reporter, frantically trying to tap out all the details down, technologically unaided, sequestered in a separate room, in one uninterrupted marathon session. And this will be the definitive record of the trial for public consumption. What's the logic behind this, now? Why allow an outside stenographer but not an audio recorder? Does the court just assume that no one will pay attention to the typed product? Or are they hoping to point to the reporter's fallibility in the instance that something embarrassing to the state is revealed? 

In case you missed it: Donate HERE to support public transcripts of the Bradley Manning trial.

Please donate and repost, reblog, tweet, email, etc., the support URL wherever possible.

Whatever the source of the Afghan War Dairies, they are proof that government secrecy is used to hide petty incompetence.

A transcript of the Bradley Manning trial will show government embarrassment, not national security, lies at the core of this trial.

I first saw this at Nat Torkington’s Four short links: 6 June 2013.

June 4, 2013

Textual Processing of Legal Cases

Filed under: Annotation,Law - Sources,Text Mining — Patrick Durusau @ 2:05 pm

Textual Processing of Legal Cases by Adam Wyner.

A presentation on Adam’s Crowdsourced Legal Case Annotation project.

Very useful if you are interested in guidance on legal case annotation.

Of course I see the UI as using topics behind the UI’s identifications and associations between those topics.

But none of that has to be exposed to the user.

June 1, 2013

6 Goals for Public Access to Case Law

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 3:30 pm

6 Goals for Public Access to Case Law by Daniel Lewis and Nik Reed.

From the post:

In March, Mike Lissner wrote for this blog about the troubling state of access to case law – noting with dismay that most of the US corpus is not publicly available. While a few states make official cases available, most still do not, and neither does the federal government. At Ravel Law we’re building a new legal research platform and, like Mike, we’ve spent substantial time troubleshooting access to law issues. Here, we will provide some more detail about how official case law is created and share our recommendations for making it more available and usable. We focus in particular on FDsys – the federal judiciary’s effort in this space – but the ideas apply broadly.

(…)

Goal

Metrics

1. Comprehensive Access to Opinions – Does every federal court release every published and unpublished opinion?
– Are the electronic records comprehensive in their historic reach?
2. Opinions that can be Cited in Court – Are the official versions of cases provided, not just the slip opinions?
– And/or, can the version released by FDsys be cited in court?
3. Vendor-Neutral Citations – Are the opinions provided with a vendor-neutral citation (using, e.g., paragraph numbers)?
4. Opinions in File Formats that Enable Innovation – Are opinions provided in both human and machine-readable formats?
5. Opinions Marked with Meta-Data – Is a machine-readable language such as XML used to tag information like case date, title, citation, etc?
– Is additional markup of information such as sectional breaks, concurrences, etc. provided?
6. Bulk Access to Opinions – Are cases accessible via bulk access methods such as FTP or an API?

OK, but with the exception of bulk access, all of these issues have been solved (past tense) by commercial vendors.

Even bulk access is probably available if you are willing to pay the vendors enough.

But public access does not mean meaningful access.

For example, the goals mentioned above would not enable the average citizen to:

Which experts appear on behalf of which parties, consistently?

Which attorneys appear before particular judges?

What is a judge’s history with particular types of lawsuits?

What are the judge’s past connections with parties or attorneys?

To say nothing of what are the laws, facts, issues and other matters in a case, which are subject to varying identifications?

Public access to case law is a good goal, but not if it only eases the financial burden for existing legal publishers.

And does not provide the public with meaningful access to case law.

May 22, 2013

Integrating the US’ Documents

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 4:42 pm

Integrating the US’ Documents by Eric Mill.

From the post:

A few weeks ago, we integrated the full text of federal bills and regulations into our alert system, Scout. Now, if you visit CISPA or a fascinating cotton rule, you’ll see the original document – nicely formatted, but also well-integrated into Scout’s layout. There are a lot of good reasons to integrate the text this way: we want you to see why we alerted you to a document without having to jump off-site, and without clunky iframes.

As importantly, we wanted to do this in a way that would be easily reusable by other projects and people. So we built a tool called us-documents that makes it possible for anyone to do this with federal bills and regulations. It’s available as a Ruby gem, and comes with a command line tool so that you can use it with Python, Node, or any other language. It lives inside the unitedstates project at unitedstates/documents, and is entirely public domain.

This could prove to be real interesting. Both as a matter of content and a technique to replicate elsewhere.

I first saw this at: Mill: Integrating the US’s Documents.

April 26, 2013

Once Under Wraps, Supreme Court Audio Trove Now Online

Filed under: Data,History,Law,Law - Sources — Patrick Durusau @ 3:09 pm

Once Under Wraps, Supreme Court Audio Trove Now Online

From the post:

On Wednesday, the U.S. Supreme Court heard oral arguments in the final cases of the term, which began last October and is expected to end in late June after high-profile rulings on gay marriage, affirmative action and the Voting Rights Act.

Audio from Wednesday’s arguments will be available at week’s end at the court’s website, but that’s a relatively new development at an institution that has historically been somewhat shuttered from public view.

The court has been releasing audio during the same week as arguments only since 2010. Before that, audio from one term generally wasn’t available until the beginning of the next term. But the court has been recording its arguments for nearly 60 years, at first only for the use of the justices and their law clerks, and eventually also for researchers at the National Archives, who could hear — but couldn’t duplicate — the tapes. As a result, until the 1990s, few in the public had ever heard recordings of the justices at work.

But as of just a few weeks ago, all of the archived historical audio — which dates back to 1955 — has been digitized, and almost all of those cases can now be heard and explored at an online archive called the Oyez Project.

A truly incredible resources for U.S. history in general and legal history in particular.

The transcripts and tapes are synchronized so your task, if you are interested, is to map these resources to other historical accounts and resources. 😉

The only disappointment is that the recordings begin with the October term of 1955. One of the most well known cases of the 20th century, Brown v. Board of Education, was argued in 1952 and re-argued in 1953. Hearing Thurgood Marshall argue that case would be a real treat.

I first saw this at: NPR: oyez.org finishes Supreme Court oral arguments project.

March 17, 2013

Open Law Lab

Filed under: Education,Law,Law - Sources,Legal Informatics — Patrick Durusau @ 12:36 pm

Open Law Lab

From the webpage:

Open Law Lab is an initiative to design law – to make it more accessible, more usable, and more engaging.

Projects:

Law Visualized

Law Education Tech

Usable Court Systems

Access to Justice by Design

Not to mention a number of interesting blog posts represented by images further down the homepage.

Access/interface issues are universal and law is a particularly tough nut to crack.

Progress in providing access to legal materials could well carry over to other domains.

I first saw this at: Hagan: Open Law Lab.

February 26, 2013

Naming U.S. Statues

Filed under: Government,Law,Law - Sources,Legal Informatics — Patrick Durusau @ 1:53 pm

Strause et al.: How Federal Statutes Are Named, and the Yale Database of Federal Statute Names

Centers on How Federal Statutes Are Named, by Renata E.B. Strause, Allyson R. Bennett, Caitlin B. Tully, M. Douglass Bellis, and Eugene R. Fidell
Law Library Journal, 105, 7-30 (2013), but includes references to a other U.S. statute name resources.

Quite useful if you are developing any indexing/topic map service that involves U.S. statutes.

There is mention of a popular name for French statues resource.

I assume there are similar resources for other legal jurisdictions. If you know of such resources, I am sure the Legal Informatics Blog would be interested.

Wikipedia and Legislative Data Workshop

Filed under: Law,Law - Sources,Wikipedia — Patrick Durusau @ 1:52 pm

Wikipedia and Legislative Data Workshop

From the post:

Interested in the bills making their way through Congress?

Think they should be covered well in Wikipedia?

Well, let’s do something about it!

On Thursday and Friday, March 14th and 15th, we are hosting a conference here at the Cato Institute to explore ways of using legislative data to enhance Wikipedia.

Our project to produce enhanced XML markup of federal legislation is well under way, and we hope to use this data to make more information available to the public about how bills affect existing law, federal agencies, and spending, for example.

What better way to spread knowledge about federal public policy than by supporting the growth of Wikipedia content?

Thursday’s session is for all comers. Starting at 2:30 p.m., we will familiarize ourselves with Wikipedia editing and policy, and at 5:30 p.m. we’ll have a Sunshine Week reception. (You don’t need to attend in the afternoon to come to the reception. Register now!)

On Friday, we’ll convene experts in government transparency, in Wikipedia editorial processes and decisions, and in MediaWiki technology to think things through and plot a course.

I remain unconvinced about greater transparency into the “apparent” legislative process.

On the other hand, it may provide the “hook” or binding point to make who wins and who loses more evident.

If the Cato representatives mention their ideals being founded in the 18th century, you might want to remember that infant mortality was greater than 40% in foundling hospitals of the time.

People who speak glowingly of the 18th century didn’t live in the 18th century. And imagine themselves as landed gentry of the time.

I first saw this at the Legal Informatics Blog.

February 23, 2013

U.S. Statutes at Large 1951-2009

Filed under: Government,Government Data,Law,Law - Sources — Patrick Durusau @ 4:28 pm

GPO is Closing Gap on Public Access to Law at JCP’s Direction, But Much Work Remains by Daniel Schuman.

From the post:

The GPO’s recent electronic publication of all legislation enacted by Congress from 1951-2009 is noteworthy for several reasons. It makes available nearly 40 years of lawmaking that wasn’t previously available online from any official source, narrowing part of a much larger information gap. It meets one of three long-standing directives from Congress’s Joint Committee on Printing regarding public access to important legislative information. And it has published the information in a way that provides a platform for third-party providers to cleverly make use of the information. While more work is still needed to make important legislative information available to the public, this online release is a useful step in the right direction.

Narrowing the Gap

In mid-January 2013, GPO published approximately 32,000 individual documents, along with descriptive metadata, including all bills enacted into law, joint concurrent resolutions that passed both chambers of Congress, and presidential proclamations from 1951-2009. The documents have traditionally been published in print in volumes known as the “Statutes at Large,” which commonly contain all the materials issued during a calendar year.

The Statutes at Large are literally an official source for federal laws and concurrent resolutions passed by Congress. The Statutes at Large are compilations of “slip laws,” bills enacted by both chambers of Congress and signed by the President. By contrast, while many people look to the US Code to find the law, many sections of the Code in actuality are not the “official” law. A special office within the House of Representatives reorganizes the contents of the slip laws thematically into the 50 titles that make up the US Code, but unless that reorganized document (the US Code) is itself passed by Congress and signed into law by the President, it remains an incredibly helpful but ultimately unofficial source for US law. (Only half of the titles of the US Code have been enacted by Congress, and thus have become law themselves.) Moreover, if you want to see the intact text of the legislation as originally passed by Congress — before it’s broken up and scattered throughout the US Code — the place to look is the Statutes at Large.

Policy wonks and trivia experts will have a field day but the value of the Statutes at Large isn’t apparent to me.

I assume there are cases where errors can be found between the U.S.C. (United States Code) and the Statutes at Large. The significance of those errors is unknown.

Like my comments on the SEC Midas program, knowing a law was passed isn’t the same as knowing who benefits from it.

Or who paid for its passage.

Knowing which laws were passed is useful.

Knowing who benefited or who paid, priceless.

February 10, 2013

Lex Machina

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 2:44 pm

Lex Machina: IP Litigation and analytics

From the about page:

Every day, Lex Machina’s crawler extracts data and documents from PACER, all 94 District Court sites, ITC’s EDIS site and the PTO site.

The crawler automatically captures every docket event and downloads key District Court case documents and every ITC document. It converts the documents by optical character recognition (OCR) to searchable text and stores each one as a PDF file.

When the crawler encounters an asserted or cited patent, it fetches information about that patent from the PTO site.

Next, the crawler invokes Lex Machina’s state-of-the-art natural language processing (NLP) technology, which includes Lexpressions™, a proprietary legal text classification engine. The NLP technology classifies cases and dockets and resolves entity names. Attorney review of docket and case classification, patents and outcomes ensures high-quality data. The structured text indexer then orders all the data and stores it for search.

Lex Machina’s web-based application enables users to run search queries that deliver easy access to the relevant docket entries and documents. It also generates lists that can be downloaded as PDF files or spreadsheet-ready CSV files.

Finally, the system generates a daily patent litigation update email, which provides links to all new patent cases and filings.

Lex Machina does not:

  • Index the World Wide Web
  • Index legal cases around the world in every language
  • Index all legal cases in the United States
  • Index all state courts in the United States
  • Index all federal court cases in the United States

Instead, Lex Machina chose a finite legal domain, patents, that has a finite vocabulary and range of data sources.

Working in that finite domain, Lex Machina has produced a high quality data product of interest to legal professions and lay persons alike.

I intend to leave conquering world hunger, ignorance and poor color coordination of clothing to Bill Gates.

You?

I first saw this at Natural Language Processing in patent litigation: Lex Machina by Junling Hu.

January 20, 2013

Operation Asymptote – [PlainSite / Aaron Swartz]

Filed under: Government,Government Data,Law,Law - Sources,Legal Informatics,Uncategorized — Patrick Durusau @ 8:06 pm

Operation Asymptote

Operation Asymptote’s goal is to make U.S. federal court data freely available to everyone.

The data is available now, but free only up to $15 worth every quarter.

Serious legal research hits that limit pretty quickly.

The project does not cost you any money, only some of your time.

The result will be another source of data to hold the system accountable.

So, how real is your commitment to doing something effective in memory of Aaron Swartz?

January 13, 2013

U.S. GPO releases House bills in bulk XML

Filed under: Government Data,Law,Law - Sources — Patrick Durusau @ 8:15 pm

U.S. GPO releases House bills in bulk XML

Bills from the current Congress but for bulk download in XML.

Users guide.

GPO press release.

Bulk House Bills Download.

Another bulk data source from the U.S. Congress.

Integration of the legislative sources will be none trivial but it has been done before, manually.

What will be more interesting will be tracking the more complex interpersonal relationships that underlie the surface of legislative sources.

November 3, 2012

2013 Federal Rules by LII Now Available on eLangdell

Filed under: Law,Law - Sources — Patrick Durusau @ 7:17 pm

2013 Federal Rules by LII Now Available on eLangdell by Sarah Glassmeyer.

From the post:

Once again, CALI is proud to partner with our friends at the Legal Information Institute to provide free ebooks of the Federal Rules of Civil Procedure, Federal Rules of Criminal Procedure and the Federal Rules of Evidence. The 2013 Editions (effective December 1, 2012) as well as the 2012 and 2011 editions can be found on the eLangdell Bookstore.

Our Federal Rules ebooks include:

  • The complete rules as of December 1, 2012 (for the 2013 edition).
  • All notes of the Advisory Committee following each rule.
  • Internal links to rules referenced within the rules.
  • External links to the LII website’s version of the US Code.

These rules are absolutely free for you to download, copy and use however you want. However, they aren’t free to make. If you’d like to donate some money to LII instead of paying money to commercial publishers, they’ve set up a donation page. A little money donated to LII goes a long way towards making the law free and accessible to all.

Legal materials are a rich area for development of semantic tools. Decades of research and development by legal publishers set a high mark for something new and useful.

If you are interested in U.S. Federal Procedure, this is your starting point.

The Federal Rules of Civil Procedure are a good example of defining process without vagueness, confusion and contradiction. (Supply your own examples of where the contrary is the case.)

October 7, 2012

New Congressional Data Available for Free Bulk Download: Bill Data 1973- , Members 1789-

Filed under: Government,Government Data,Law - Sources,Legal Informatics — Patrick Durusau @ 4:28 am

New Congressional Data Available for Free Bulk Download: Bill Data 1973- , Members 1789-

From Legal Informatics news of:

Of interest if you like U.S. history and/or recent events.

What other data would you combine with the data you find here?

September 29, 2012

On Legislative Collaboration and Version Control

Filed under: Law,Law - Sources — Patrick Durusau @ 4:33 pm

On Legislative Collaboration and Version Control

John Wonderlich of the Sunlight Foundation writes:

We often are confronted with the idea of legislation being written and tracked online through new tools, whether it’s Clay Shirky’s recent TED talk, or a long, long list of experiments and pilot projects (including Sunlight’s PublicMarkup.org and Rep. Issa’s MADISON) designed to give citizens a new view and voice in the production of legislation.

Proponents of applying version control systems to law have a powerful vision: a bill or law, with its history laid bare and its sections precisely broken out, and real names attached prominently to each one. Why shouldn’t we able to have that? And since version control systems are helpful to the point of absolute necessity in any collaborative software effort, why wouldn’t Congress employ such an approach?

When people first happen upon this idea, their reaction tends to fall into two camps, which I’ll refer to as triumphalist and dismissive.

John’s and the Sunlight Foundation’s view that legislative history of acts of Congress is a form of transparency is the view taught to high school civics classes. And about as naive as it comes.

True enough, there are extensive legislative histories for every act passed by Congress. That has very little to do with how laws come to be written, by who and for whose interests.

Say for example a lobbyist who has contributed to a Senator’s campaign is concerned with the rules for visa’s for computer engineers. He/she visits the Senator and just happens to have a draft of amendments, created by a well known Washington law firm, that addresses their needs. That document is studied by the Senator’s staff.

Lo and behold, similar language appears in a bill introduced by the Senator. (Or as an amendment to some other bill.)

The Senator will even say that he is sponsoring the legislation to further the interests of those “job creators” in the high tech industry. What gets left out is the access to the Senator by the lobbyist and the assistance in bringing that legislation to the fore.

Indulging governments in their illusions of transparency is the surest way to avoid meaningful transparency.

Now you have to ask yourself, who has an interest in avoiding meaningful transparency?

I first saw this at Legal Informatics (which has other links that will interest you).

September 23, 2012

Congress.gov: New Official Source of U.S. Federal Legislative Information

Filed under: Government,Government Data,Law,Law - Sources,Legal Informatics — Patrick Durusau @ 7:50 pm

Congress.gov: New Official Source of U.S. Federal Legislative Information

Legal Informatics has gathered up links to a number of reviews/comments on the new legislative interface for the U.S. federal government.

You can see the beta version at: Congress.gov.

Personally I like search and popularity being front and center, but that makes me wonder what isn’t available. Like bulk downloads in some reasonable format (can you say XML?).

What do you think about the interface?

The Cost of Strict Global Consistency [Or Rules for Eventual Consistency]

Filed under: Consistency,Database,Finance Services,Law,Law - Sources — Patrick Durusau @ 10:15 am

What if all transactions required strict global consistency? by Matthew Aslett.

Matthew quotes Basho CTO Justin Sheehy on eventual consistency and traditional accounting:

“Traditional accounting is done in an eventually-consistent way and if you send me a payment from your bank to mine then that transaction will be resolved in an eventually consistent way. That is, your bank account and mine will not have a jointly-atomic change in value, but instead yours will have a debit and mine will have a credit, each of which will be applied to our respective accounts.”

And Matthew comments:

The suggestion that bank transactions are not immediately consistent appears counter-intuitive. Comparing what happens in a transaction with a jointly atomic change in value, like buying a house, with what happens in normal transactions, like buying your groceries, we can see that for normal transactions this statement is true.

We don’t need to wait for the funds to be transferred from our accounts to a retailer before we can walk out the store. If we did we’d all waste a lot of time waiting around.

This highlights a couple of things that are true for both database transactions and financial transactions:

  • that eventual consistency doesn’t mean a lack of consistency
  • that different transactions have different consistency requirements
  • that if all transactions required strict global consistency we’d spend a lot of time waiting for those transactions to complete.

All of which is very true but misses an important point about financial transctions.

Financial transactions (involving banks, etc.) are eventually consistent according to the same rules.

That’s no accident. It didn’t just happen that banks adopted ad hoc rules that resulted in a uniform eventual consistency.

It didn’t happen over night but the current set of rules for “uniform eventual consistency” of banking transactions are spelled out by the Uniform Commercial Code. (And other laws, regulations but that is a major part of it.)

Dare we say a uniform semantic for financial transactions was hammered out without the use of formal ontologies or web addresses? And that it supports billions of transactions on a daily basis? To become eventually consistent?

Think about the transparency (to you) of your next credit card transaction. Standards and eventual consistency make that possible.

September 16, 2012

Supreme Court Database–Updated [US]

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 1:30 pm

Supreme Court Database–Updated

Michael Heise writes:

An exceptionally helpful source of data for those interested in US Supreme Court decisions was recently updated to include data from OT2011. The Supreme Court Database (2012 release, v.01, here) “contains over two hundred pieces of information about each case decided by the Court between the 19[46] and 20[11] terms. Examples include the identity of the court whose decision the Supreme Court reviewed, the parties to the suit, the legal provisions considered in the case, and the votes of the Justices.” An online codebook for this leading compilation of Supreme Court decisions (particularly for political scientists) can be found here.

The Supreme Court Database sponsors this dataset, tools for analysis and training materials to assist you with both.

Very useful for combining with other data and analysis, ranging from political science and history to more traditional legal approaches.

September 10, 2012

Sunlight Academy (Finding US Government Data)

Filed under: Government,Government Data,Law,Law - Sources — Patrick Durusau @ 4:05 pm

Sunlight Academy

From the website:

Welcome to Sunlight Academy, a collection of interactive tutorials for journalists, activists, researchers and students to learn about tools by the Sunlight Foundation and others to unlock government data.

Be sure to create a profile to access our curriculum, track your progress, watch videos, complete training activities and get updates on new tutorials and tools.

Whether you are an investigative journalist trying to get insight on a complex data set, an activist uncovering the hidden influence behind your issue, or a congressional staffer in need of mastering legislative data, Sunlight Academy guides you through how to make our tools work for you. Let’s get started!

The Sunlight Foundation has created tools to make government data more accessible.

Unlike some governments and software projects, the Sunlight Foundation business model isn’t based on poor or non-existent documentation.

Modules (as of 2012 September 10):

  • Tracking Government
    • Scout Scout is a legislative and governmental tracking tool from the Sunlight Foundation that alerts you when Congress or your state capitol talks about or takes action on issues you care about. Learn how to search and create alerts on federal and state legislation, regulations and the Congressional Record.
    • Scout (Webinar) Recorded webinar and demo of Scout from July 26, 2012. The session covered basic skills such as search terms and bill queries, as well as advanced functions such as tagging, merging outside RSS feeds and creating curated search collections.
  • Unlocking Data
    • Political Ad Sleuth Frustrated by political ads inundating your TV? Learn how you can discover who is funding these ads from the public files at your local television station through this tutorial.
    • Unlocking APIs What are APIs and how do they deliver government data? This tutorial provides an introduction to using APIs and highlights what Sunlight’s APIs have to offer on legislative and congressional data.
  • Lobbying
    • Lobbying Contribution Reports These reports highlight the millions of dollars that lobbying entities spend every year giving to charities in honor of lawmakers and executive branch officials, technically referred to as “honorary fees.” Find out how to investigate lobbying contribution reports, understand the rules behind them and see what you can do with the findings.
    • Lobbying Registration Tracker Learn about the Lobbying Registration Tracker, a Sunlight Foundation tool that allows you to track new registrations for federal lobbyists and lobbying firms. This database allows users to view registrations as they’re submitted, browse by issue, registrant or client, and see the trends in issues and registrations over the last 12 months.
    • Lobbying Report Form Four times a year, groups that lobby Congress and the federal government file reports on their activities. Unlock the important information contained in the quarterly lobbying reports to keep track of who’s influencing whom in Washington. Learn tips on how to read the reports and how they can inform your reporting.
  • Data Analysis
    • Data Visualizations in Google Docs While Google is often used for internet searches and maps, it can also help with data visualizations via Google Charts. Learn how to use Google Docs to generate interactive charts in this training.
    • Mapping Campaign Finance Data Campaign finance data can be complex and confusing — for reporters and for readers. But it doesn’t have to be. One way to make sense of it all is through mapping. Learn how to turn campaign finance information into beautiful maps, all through free tools.
    • Pivot Tables Pivot tables are powerful tools, but it’s not always obvious how to use them. Learn how to create and use pivot tables in Excel to aggregate and summarize data that otherwise would require a database.
  • Research Tools
    • Advanced Google Searches Google has made search easy and effective, but that doesn’t mean it can’t be better. Learn how to effectively use Google’s Advanced Search operators so you can get what you’re looking for without wasting time on irrelevant results.
    • Follow the Unlimited Money (webinar) Recorded webinar from August 8, 2012. This webinar covered tools to follow the millions of dollars being spent this election year by super PACs and other outside groups.
    • Learning about Data.gov Data.gov seeks to organize all of the U.S. government’s data, a daunting and unfinished task. In this module, learn about the powers and limitations of Data.gov, and what other resources to use to fill in Data.gov’s gaps.

Researching Current Federal Legislation and Regulations:…

Filed under: Government,Government Data,Law,Law - Sources,Legal Informatics — Patrick Durusau @ 3:30 pm

Researching Current Federal Legislation and Regulations: A Guide to Resources for Congressional Staff

Description quoted at Full Text Reports:

This report is designed to introduce congressional staff to selected governmental and nongovernmental sources that are useful in tracking and obtaining information federal legislation and regulations. It includes governmental sources such as the Legislative Information System (LIS), THOMAS, the Government Printing Office’s Federal Digital System (FDsys), and U.S. Senate and House websites. Nongovernmental or commercial sources include resources such as HeinOnline and the Congressional Quarterly (CQ) websites. It also highlights classes offered by the Congressional Research Service (CRS) and the Library of Congress Law Library.

This report will be updated as new information is available.

Direct link to PDF: Researching Current Federal Legislation and Regulations: A Guide to Resources for Congressional Staff

A very useful starting point for research on U.S. federal legislation and regulations, but only a starting point.

Each listed resource merits a user’s guide. And no two of them are exactly the same.

Suggestions for research/topic map exercises based on this listing of resources?

August 26, 2012

Linked Legal Data: A SKOS Vocabulary for the Code of Federal Regulations

Filed under: Law,Law - Sources,Linked Data,SKOS — Patrick Durusau @ 1:17 pm

Linked Legal Data: A SKOS Vocabulary for the Code of Federal Regulations by Núria Casellas.

Abstract:

This paper describes the application of Semantic Web and Linked Data techniques and principles to regulatory information for the development of a SKOS vocabulary for the Code of Federal Regulations (in particular of Title 21, Food and Drugs). The Code of Federal Regulations is the codification of the general and permanent enacted rules generated by executive departments and agencies of the Federal Government of the United States, a regulatory corpus of large size, varied subject-matter and structural complexity. The CFR SKOS vocabulary is developed using a bottom-up approach for the extraction of terminology from text based on a combination of syntactic analysis and lexico-syntactic pattern matching. Although the preliminary results are promising, several issues (a method for hierarchy cycle control, expert evaluation and control support, named entity reduction, and adjective and prepositional modifier trimming) require improvement and revision before it can be implemented for search and retrieval enhacement of regulatory materials published by the Legal Information Institute. The vocabulary is part of a larger Linked Legal Data project, that aims at using Semantic Web technologies for the representation and management of legal data.

Considers use of nonregulatory vocabularies, conversion of existing indexing materials and finally settles on NLP processing of the text.

Granting that Title 21, Food and Drugs is no walk in the part, take a peek at the regulations for Title 26, Internal Revenue Code. 😉

A difficulty that I didn’t see mentioned is the changing semantics in statutory law and regulations.

The definition of “person,” for example, varies widely depending upon where it appears. Both chronologically and synchronically.

Moreover, if I have a nonregulatory vocabulary and/or CFR indexes, why shouldn’t that map to the CFR SKOS vocabulary?

I may not have the “correct” index but the one I prefer to use. Shouldn’t that be enabled?

I first saw this at Legal Informatics.

August 11, 2012

Lima on Visualization and Legislative Memory of the Brazilian Civil Code

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 6:28 pm

Lima on Visualization and Legislative Memory of the Brazilian Civil Code

Legal Informatics report the publication of the legislative history of the Brazilian Civil Code and a visualization of the Brazilian Civil Code.

Tying in Planiol’s Treatise on Civil Law (or other commentators) to such resources would make a nice showcase for topic maps.

August 8, 2012

GitLaw in Germany

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 1:51 pm

GitLaw in Germany: Deutsche Bundesgesetze- und verordnungen im Markdown auf GitHub = German Federal Laws and Regulations in Markdown on GitHub

Legal Informatics reports that German Federal Laws and Regulations are available in Markdown.

A useful resource if you have legal resources to make good use of it.

I would not advise self-help based on a Google translation of any of these materials.

« Newer PostsOlder Posts »

Powered by WordPress