Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 13, 2013

Legislative XML Data Mapping [$10K]

Filed under: Challenges,Contest,Law - Sources,Legal Informatics — Patrick Durusau @ 6:21 pm

Legislative XML Data Mapping (Library of Congress)

First, the important stuff:

First Place: $10K

Entry due by: December 31 at 5:00pm EST

Second, the details:

The Library of Congress is sponsoring two legislative data challenges to advance the development of international data exchange standards for legislative data. These challenges are an initiative to encourage broad participation in the development and application of legislative data standards and to engage new communities in the use of legislative data. Goals of this initiative include:
• Enabling wider accessibility and more efficient exchange of the legislative data of the United States Congress and the United Kingdom Parliament,
• Encouraging the development of open standards that facilitate better integration, analysis, and interpretation of legislative data,
• Fostering the use of open source licensing for implementing legislative data standard.

The Legislative XML Data Mapping Challenge invites competitors to produce a data map for US bill XML and the most recent Akoma Ntoso schema and UK bill XML and the most recent Akoma Ntoso schema. Gaps or issues identified through this challenge will help to shape the evolving Akoma Ntoso international standard.

The winning solution will win $10,000 in cash, as well as opportunities for promotion, exposure, and recognition by the Library of Congress. For more information about prizes please see the Official Rules.

Can you guess what tool or technique I would suggest that you use? 😉

The winner is announced February 12, 2014 at 5:00pm EST.

Too late for the holidays this year, too close to Valentines Day, what holiday will you be wanting to celebrate?

September 11, 2013

Input Requested: Survey on Legislative XML

Filed under: Law - Sources,Legal Informatics,Semantics — Patrick Durusau @ 5:15 pm

Input Requested: Survey on Legislative XML

A request for survey participants who are familiar with XML and law. To comment on the Crown Legislative Markup Language (CLML) which is used for the content at: legislation.gov.uk.

Background:

By way of background, the Crown Legislation Mark-up Language (CLML) is used to represent UK legislation in XML. It’s the base format for all legislation published on the legislation.gov.uk website. We make both the schema and all our data freely available for anyone to use, or re-use, under the UK government’s Open Government Licence. CLML is currently expressed as a W3C XML Schema which is owned and maintained by The National Archives. A version of the schema can be accessed online at http://www.legislation.gov.uk/schema/legislation.xsd . Legislation as CLML XML can be accessed from the website using the legislation.gov.uk API. Simply add “/data.xml” to any legislation content page, e.g. http://www.legislation.gov.uk/ukpga/2010/1/data.xml .

Why is this important for topic maps?

Would you believe that the markup semantics of CLML are different from the semantics of United States Legislative Markup (USLM)?

That’s just markup syntax differences. Hard to say what substantive semantic variations are in the laws themselves.

Mapping legal semantics becomes important when the United States claims extraterritorial jurisdiction for the application of its laws.

Or when the United States uses its finance laws to inflict harm on others. (Treasury’s war: the unleashing of a new era of financial warfare by Juan Carlos Zarate.)

Mapping legal semantics won’t make U.S. claims any less extreme but may help convince others of a clear and present danger.

June 8, 2013

Bradley Manning Trial Transcript (Funding Request)

Filed under: Law - Sources,Legal Informatics,Security — Patrick Durusau @ 1:22 pm

No, not for me.

Funding to support the creation of a public transcript of Bradley Manning’s trial.

Brian Merchant reports in: The Only Public Transcript of the Bradley Manning Trial Will Be Tapped Out on a Crowd-Funded Typewriter:

The Bradley Manning trial began this week, and it is being held largely in secret—according to the Freedom of the Press Foundation, 270 of the 350 media organizations that applied for access were denied. Major outlets like Reuters, the AP, and the Guardian, were forced to sign a document stating they would withhold certain information in exchange for the privilege of attending.

Oh, and no video or audio recorders allowed. And no official transcripts will be made available to anyone.  

But, the court evidently couldn't find grounds to boot out FPF's crowd-funded stenographers, who will be providing the only publicly available transcripts of the trial. (You can donate to the effort and read the transcripts here.)

Which is good news for journalists and anyone looking for an accurate—and public—record of the trial. But the fact that a volunteer stenographer is providing the only comprehensive source of information about such a monumental event is pretty absurd. 

The disclaimer that precedes each transcript epitomizes said absurdity. It reads: "This transcript was made by a court reporter who … was not permitted to be in the actual courtroom where the proceedings took place, but in a media room listening to and watching live audio/video feed, not permitted to make an audio backup recording for editing purposes, and not having the ability to control the proceedings in order to produce an accurate verbatim transcript."

In other words, it's a lone court reporter, frantically trying to tap out all the details down, technologically unaided, sequestered in a separate room, in one uninterrupted marathon session. And this will be the definitive record of the trial for public consumption. What's the logic behind this, now? Why allow an outside stenographer but not an audio recorder? Does the court just assume that no one will pay attention to the typed product? Or are they hoping to point to the reporter's fallibility in the instance that something embarrassing to the state is revealed? 

In case you missed it: Donate HERE to support public transcripts of the Bradley Manning trial.

Please donate and repost, reblog, tweet, email, etc., the support URL wherever possible.

Whatever the source of the Afghan War Dairies, they are proof that government secrecy is used to hide petty incompetence.

A transcript of the Bradley Manning trial will show government embarrassment, not national security, lies at the core of this trial.

I first saw this at Nat Torkington’s Four short links: 6 June 2013.

June 1, 2013

6 Goals for Public Access to Case Law

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 3:30 pm

6 Goals for Public Access to Case Law by Daniel Lewis and Nik Reed.

From the post:

In March, Mike Lissner wrote for this blog about the troubling state of access to case law – noting with dismay that most of the US corpus is not publicly available. While a few states make official cases available, most still do not, and neither does the federal government. At Ravel Law we’re building a new legal research platform and, like Mike, we’ve spent substantial time troubleshooting access to law issues. Here, we will provide some more detail about how official case law is created and share our recommendations for making it more available and usable. We focus in particular on FDsys – the federal judiciary’s effort in this space – but the ideas apply broadly.

(…)

Goal

Metrics

1. Comprehensive Access to Opinions – Does every federal court release every published and unpublished opinion?
– Are the electronic records comprehensive in their historic reach?
2. Opinions that can be Cited in Court – Are the official versions of cases provided, not just the slip opinions?
– And/or, can the version released by FDsys be cited in court?
3. Vendor-Neutral Citations – Are the opinions provided with a vendor-neutral citation (using, e.g., paragraph numbers)?
4. Opinions in File Formats that Enable Innovation – Are opinions provided in both human and machine-readable formats?
5. Opinions Marked with Meta-Data – Is a machine-readable language such as XML used to tag information like case date, title, citation, etc?
– Is additional markup of information such as sectional breaks, concurrences, etc. provided?
6. Bulk Access to Opinions – Are cases accessible via bulk access methods such as FTP or an API?

OK, but with the exception of bulk access, all of these issues have been solved (past tense) by commercial vendors.

Even bulk access is probably available if you are willing to pay the vendors enough.

But public access does not mean meaningful access.

For example, the goals mentioned above would not enable the average citizen to:

Which experts appear on behalf of which parties, consistently?

Which attorneys appear before particular judges?

What is a judge’s history with particular types of lawsuits?

What are the judge’s past connections with parties or attorneys?

To say nothing of what are the laws, facts, issues and other matters in a case, which are subject to varying identifications?

Public access to case law is a good goal, but not if it only eases the financial burden for existing legal publishers.

And does not provide the public with meaningful access to case law.

May 22, 2013

Integrating the US’ Documents

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 4:42 pm

Integrating the US’ Documents by Eric Mill.

From the post:

A few weeks ago, we integrated the full text of federal bills and regulations into our alert system, Scout. Now, if you visit CISPA or a fascinating cotton rule, you’ll see the original document – nicely formatted, but also well-integrated into Scout’s layout. There are a lot of good reasons to integrate the text this way: we want you to see why we alerted you to a document without having to jump off-site, and without clunky iframes.

As importantly, we wanted to do this in a way that would be easily reusable by other projects and people. So we built a tool called us-documents that makes it possible for anyone to do this with federal bills and regulations. It’s available as a Ruby gem, and comes with a command line tool so that you can use it with Python, Node, or any other language. It lives inside the unitedstates project at unitedstates/documents, and is entirely public domain.

This could prove to be real interesting. Both as a matter of content and a technique to replicate elsewhere.

I first saw this at: Mill: Integrating the US’s Documents.

May 20, 2013

FuzzyLaw [FuzzyDBA, FuzzyRDF, FuzzySW?]

Filed under: Law,Legal Informatics,Semantic Diversity,Semantic Inconsistency,Users — Patrick Durusau @ 2:03 pm

FuzzyLaw

From the webpage:

(…)

FuzzyLaw has gathered explanations of legal terms from members of the public in order to get a sense of what the ‘person on the street’ has in mind when they think of a legal term. By making lay-people’s explanations of legal terms available to interpreters, police and other legal professionals, we hope to stimulate debate and learning about word meaning, public understanding of law and the nature of explanation.

The explanations gathered in FuzzyLaw are unusual in that they are provided by members of the public. These people, all aged over 18, regard themselves as ‘native speakers’, ‘first language speakers’ and ‘mother tongue’ speakers of English and have lived in England and/or Wales for 10 years or more. We might therefore expect that they will understand English legal terminology as well as any member of the public might. No one who has contributed has ever worked in the criminal law system or as an interpreter or translator. They therefore bring no special expertise to the task of explanation, beyond whatever their daily life has provided.

We have gathered explanations for 37 words in total. You can see a sample of these explanations on FuzzyLaw. The sample of explanations is regularly updated. You can also read responses to the terms and the explanations from mainly interpreters, police officers and academics. You are warmly invited to add your own responses and join in the discussion of each and every word. Check back regularly to see how discussions develop and consider bookmarking the site for future visits. The site also contains commentaries on interesting phenomena which have emerged through the site. You can respond to the commentaries too on that page, contributing to the developing research project.

(…)

Have you ever wondered that the ‘person on the street’ thinks about relational databases, RDF or the Semantic Web?

Those are the folks who are being pushed content based on interpretations not their own making.

Here’s a work experiment for you:

  1. Take ten search terms from your local query log.
  2. At each department staff meeting, distribute sheets with the words, requesting everyone to define the terms in their own words. No wrong answers.
  3. Tally up the definitions per department and across the company.
  4. Comments anyone?

I first saw this at: FuzzyLaw: Collection of lay citizens’ understandings of legal terminology.

April 27, 2013

Bulk Access to Law-Related Linked Data:…

Filed under: Law,Legal Informatics — Patrick Durusau @ 4:23 pm

Bulk Access to Law-Related Linked Data: LC & VIAF Name Authority Records and LC Subject Authority Records

From the post:

Linked Data versions of Library of Congress name authority records and subject authority records are now available for bulk download from the Library of Congress Linked Data Service, according to Kevin Ford at Library of Congress.

In addition, VIAF, the Virtual International Authority File, now provides bulk access to Linked Data versions of name authority records for organizations, including government entities and business organizations, from more than 30 national or research libraries. VIAF data are also searchable through the VIAF Web user interface.

Always good to have more data but I would use caution with the Library of Congress authority records.

See for example, TFM (To Find Me) Mark Twain.

Authority record means just that, a record issued by an authority.

The state of being a “correct” records is something else entirely.

March 24, 2013

Mapping the Supreme Court

Filed under: Law,Legal Informatics — Patrick Durusau @ 3:05 pm

Mapping the Supreme Court

From the webpage:

The Supreme Court Mapping Project is an original software-driven initiative currently in Beta development. The project, under the direction of University of Baltimore School of Law Assistant Professor Colin Starger, seeks to use information design and software technology to enhance teaching, learning, and scholarship focused on Supreme Court precedent.

The SCOTUS Mapping Project has two distinct components:

Enhanced development of the Mapper software. This software enables users to create sophisticated interactive maps of Supreme Court doctrine by plotting relationships between majority, concurring and dissenting opinions. With the software, users can both visualize how different “lines” of Supreme Court opinions have evolved, and employ animation to make interactive presentations for audiences.

Building an extensive library of Supreme Court doctrinal maps. By highlighting the relationships between essential and influential Court opinions, these maps promote efficient learning and understanding of key doctrinal debates and can assist students, scholars, and practitioners alike. The library already includes maps of key regions of doctrine surrounding the Due Process Clause, the Commerce Clause, and the Fourth Amendment.

The SCOTUS Mapping Project is in Beta-phase development and is currently seeking Beta participants. If you are interested in participating in the Beta phase of the project, contact Prof. Starger.

For identifying and learning lines of Supreme Court decisions, an excellent tool.

I thought the combined mapping in Maryland v. King (warrantless suspicionless search of DNA violated the Fourth Amendment?):

MD v. King

is particularly useful. (Image is a link to the original image.)

It illustrates that Supreme Court decisions on the Fourth Amendment are more mixed than is represented in the popular press.

Using prior decisions as topics, it would be interesting to see a topic map of the social context of those prior decisions.

No Supreme Court decision occurs in a vacuum.

March 17, 2013

Open Law Lab

Filed under: Education,Law,Law - Sources,Legal Informatics — Patrick Durusau @ 12:36 pm

Open Law Lab

From the webpage:

Open Law Lab is an initiative to design law – to make it more accessible, more usable, and more engaging.

Projects:

Law Visualized

Law Education Tech

Usable Court Systems

Access to Justice by Design

Not to mention a number of interesting blog posts represented by images further down the homepage.

Access/interface issues are universal and law is a particularly tough nut to crack.

Progress in providing access to legal materials could well carry over to other domains.

I first saw this at: Hagan: Open Law Lab.

March 3, 2013

Data models for version management…

Filed under: Data Models,Government,Government Data,Legal Informatics — Patrick Durusau @ 2:27 pm

Data models for version management of legislative documents by María Hallo Carrasco, M. Mercedes Martínez-González, and Pablo de la Fuente Redondo.

Abstract:

This paper surveys the main data models used in projects including the management of changes in digital normative legislation. Models have been classified based on a set of criteria, which are also proposed in the paper. Some projects have been chosen as representative for each kind of model. The advantages and problems of each type are analysed, and future trends are identified.

I first saw this at Legal Informatics, which had already assembled the following resources:

The legislative metadata models discussed in the paper include:

Useful as models of change tracking should you want to express that in a topic map.

To say nothing of overcoming the semantic impedance between these model.

February 26, 2013

Naming U.S. Statues

Filed under: Government,Law,Law - Sources,Legal Informatics — Patrick Durusau @ 1:53 pm

Strause et al.: How Federal Statutes Are Named, and the Yale Database of Federal Statute Names

Centers on How Federal Statutes Are Named, by Renata E.B. Strause, Allyson R. Bennett, Caitlin B. Tully, M. Douglass Bellis, and Eugene R. Fidell
Law Library Journal, 105, 7-30 (2013), but includes references to a other U.S. statute name resources.

Quite useful if you are developing any indexing/topic map service that involves U.S. statutes.

There is mention of a popular name for French statues resource.

I assume there are similar resources for other legal jurisdictions. If you know of such resources, I am sure the Legal Informatics Blog would be interested.

February 10, 2013

Lex Machina

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 2:44 pm

Lex Machina: IP Litigation and analytics

From the about page:

Every day, Lex Machina’s crawler extracts data and documents from PACER, all 94 District Court sites, ITC’s EDIS site and the PTO site.

The crawler automatically captures every docket event and downloads key District Court case documents and every ITC document. It converts the documents by optical character recognition (OCR) to searchable text and stores each one as a PDF file.

When the crawler encounters an asserted or cited patent, it fetches information about that patent from the PTO site.

Next, the crawler invokes Lex Machina’s state-of-the-art natural language processing (NLP) technology, which includes Lexpressions™, a proprietary legal text classification engine. The NLP technology classifies cases and dockets and resolves entity names. Attorney review of docket and case classification, patents and outcomes ensures high-quality data. The structured text indexer then orders all the data and stores it for search.

Lex Machina’s web-based application enables users to run search queries that deliver easy access to the relevant docket entries and documents. It also generates lists that can be downloaded as PDF files or spreadsheet-ready CSV files.

Finally, the system generates a daily patent litigation update email, which provides links to all new patent cases and filings.

Lex Machina does not:

  • Index the World Wide Web
  • Index legal cases around the world in every language
  • Index all legal cases in the United States
  • Index all state courts in the United States
  • Index all federal court cases in the United States

Instead, Lex Machina chose a finite legal domain, patents, that has a finite vocabulary and range of data sources.

Working in that finite domain, Lex Machina has produced a high quality data product of interest to legal professions and lay persons alike.

I intend to leave conquering world hunger, ignorance and poor color coordination of clothing to Bill Gates.

You?

I first saw this at Natural Language Processing in patent litigation: Lex Machina by Junling Hu.

January 20, 2013

Operation Asymptote – [PlainSite / Aaron Swartz]

Filed under: Government,Government Data,Law,Law - Sources,Legal Informatics,Uncategorized — Patrick Durusau @ 8:06 pm

Operation Asymptote

Operation Asymptote’s goal is to make U.S. federal court data freely available to everyone.

The data is available now, but free only up to $15 worth every quarter.

Serious legal research hits that limit pretty quickly.

The project does not cost you any money, only some of your time.

The result will be another source of data to hold the system accountable.

So, how real is your commitment to doing something effective in memory of Aaron Swartz?

January 11, 2013

Legal Informatics Glossary of Terms

Filed under: Glossary,Legal Informatics — Patrick Durusau @ 7:36 pm

Legal Informatics Glossary of Terms by Grant Vergottini.

From the post:

I work with people from around the world on matters relating to legal informatics. One common issue we constantly face is the issue of terminology. We use many of the same terms, but the subtly of their definitions end up causing no end of confusion. To try and address this problem, I’ve proposed a number of times that we band together to define a common vocabulary, and when we can’t arrive at that, at least we can understand the differences that exist amongst us.

To get the ball rolling, I have started a wiki on GitHub and populated it with many of the terms I use in my various roles. Their definitions are a work-in-progress at this point. I am refining them as I find the time. However, rather than trying to build my own private vocabulary, I would like this to be a collaborative effort. To that end, I am inviting anyone with an interest in this to help build out the vocabulary by adding your own terms with definitions to the list and improving the ones I have started.

My legal informatics glossary of terms can be found in my public legal Informatics project at:

https://github.com/grantcv1/Legal-Informatics/wiki/Glossary

Now there is a topic map sounding like project.

I first saw this at: Vergottini: Legal Informatics Glossary of Terms.

October 23, 2012

Jurimetrics (Modern Uses of Logic in Law (MULL))

Filed under: Law,Legal Informatics,Logic,Semantics — Patrick Durusau @ 10:48 am

Jurimetrics (Modern Uses of Logic in Law (MULL))

From the about page:

Jurimetrics, The Journal of Law, Science, and Technology (ISSN 0897-1277), published quarterly, is the journal of the American Bar Association Section of Science & Technology Law and the Center for Law, Science & Innovation. Click here to view the online version of Jurimetrics.

Jurimetrics is a forum for the publication and exchange of ideas and information about the relationships between law, science and technology in all areas, including:

  • Physical, life and social sciences
  • Engineering, aerospace, communications and computers
  • Logic, mathematics and quantitative methods
  • The uses of science and technology in law practice, adjudication and court and agency administration
  • Policy implications and legislative and administrative control of science and technology.

Jurimetrics was first published in 1959 under the leadership of Layman Allen as Modern Uses of Logic in Law (MULL). The current name was adopted in 1966. Jurimetrics is the oldest journal of law and science in the United States, and it enjoys a circulation of more than 8,000, which includes all members of the ABA Section of Science & Technology Law.

I just mentioned this journal in Wyner et al.: An Empirical Approach to the Semantic Representation of Laws, but wanted to also capture its earlier title, Modern Uses of Logic in Law (MULL), because I am likely to search for it as well.

I haven’t looked at the early issues in some years but as I recall, they were quite interesting.

Wyner et al.: An Empirical Approach to the Semantic Representation of Laws

Filed under: Language,Law,Legal Informatics,Machine Learning,Semantics — Patrick Durusau @ 10:37 am

Wyner et al.: An Empirical Approach to the Semantic Representation of Laws

Legal Informatics brings news of Dr. Adam Wyner’s paper, An Empirical Approach to the Semantic Representation of Laws, and quotes the abstract as:

To make legal texts machine processable, the texts may be represented as linked documents, semantically tagged text, or translated to formal representations that can be automatically reasoned with. The paper considers the latter, which is key to testing consistency of laws, drawing inferences, and providing explanations relative to input. To translate laws to a form that can be reasoned with by a computer, sentences must be parsed and formally represented. The paper presents the state-of-the-art in automatic translation of law to a machine readable formal representation, provides corpora, outlines some key problems, and proposes tasks to address the problems.

The paper originated at Project IMPACT.

If you haven’t looked at semantics and the law recently, this is a good opportunity to catch up.

I have only skimmed the paper and its references but am already looking for online access to early issues of Jurimetrics (a journal by the American Bar Association) that addressed such issues many years ago.

Should be fun to see what has changed and by how much. What issues remain and how they are viewed today.

October 7, 2012

New Congressional Data Available for Free Bulk Download: Bill Data 1973- , Members 1789-

Filed under: Government,Government Data,Law - Sources,Legal Informatics — Patrick Durusau @ 4:28 am

New Congressional Data Available for Free Bulk Download: Bill Data 1973- , Members 1789-

From Legal Informatics news of:

Of interest if you like U.S. history and/or recent events.

What other data would you combine with the data you find here?

September 23, 2012

Congress.gov: New Official Source of U.S. Federal Legislative Information

Filed under: Government,Government Data,Law,Law - Sources,Legal Informatics — Patrick Durusau @ 7:50 pm

Congress.gov: New Official Source of U.S. Federal Legislative Information

Legal Informatics has gathered up links to a number of reviews/comments on the new legislative interface for the U.S. federal government.

You can see the beta version at: Congress.gov.

Personally I like search and popularity being front and center, but that makes me wonder what isn’t available. Like bulk downloads in some reasonable format (can you say XML?).

What do you think about the interface?

September 16, 2012

Supreme Court Database–Updated [US]

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 1:30 pm

Supreme Court Database–Updated

Michael Heise writes:

An exceptionally helpful source of data for those interested in US Supreme Court decisions was recently updated to include data from OT2011. The Supreme Court Database (2012 release, v.01, here) “contains over two hundred pieces of information about each case decided by the Court between the 19[46] and 20[11] terms. Examples include the identity of the court whose decision the Supreme Court reviewed, the parties to the suit, the legal provisions considered in the case, and the votes of the Justices.” An online codebook for this leading compilation of Supreme Court decisions (particularly for political scientists) can be found here.

The Supreme Court Database sponsors this dataset, tools for analysis and training materials to assist you with both.

Very useful for combining with other data and analysis, ranging from political science and history to more traditional legal approaches.

September 10, 2012

Researching Current Federal Legislation and Regulations:…

Filed under: Government,Government Data,Law,Law - Sources,Legal Informatics — Patrick Durusau @ 3:30 pm

Researching Current Federal Legislation and Regulations: A Guide to Resources for Congressional Staff

Description quoted at Full Text Reports:

This report is designed to introduce congressional staff to selected governmental and nongovernmental sources that are useful in tracking and obtaining information federal legislation and regulations. It includes governmental sources such as the Legislative Information System (LIS), THOMAS, the Government Printing Office’s Federal Digital System (FDsys), and U.S. Senate and House websites. Nongovernmental or commercial sources include resources such as HeinOnline and the Congressional Quarterly (CQ) websites. It also highlights classes offered by the Congressional Research Service (CRS) and the Library of Congress Law Library.

This report will be updated as new information is available.

Direct link to PDF: Researching Current Federal Legislation and Regulations: A Guide to Resources for Congressional Staff

A very useful starting point for research on U.S. federal legislation and regulations, but only a starting point.

Each listed resource merits a user’s guide. And no two of them are exactly the same.

Suggestions for research/topic map exercises based on this listing of resources?

September 3, 2012

Legal Rules, Text and Ontologies Over Time [The eternal “now?”]

Filed under: Legal Informatics,Ontology,Semantics — Patrick Durusau @ 3:06 pm

Legal Rules, Text and Ontologies Over Time by Monica Palmirani, Tommaso Ognibene and Luca Cervone.

Abstract:

The current paper presents the “Fill the gap” project that aims to design a set of XML standards for modelling legal documents in the Semantic Web over time. The goal of the project is to design an information system using XML standards able to store in an XML-native database legal resources and legal rules in an integrated way for supporting legal knowledge engineers and end-users (e.g., public administrative officers, judges, citizens).

It was refreshing to read:

The law changes over time and consequently change the rules and the ontological classes (e.g., the definition of EU citizenship changed in 2004 with the annexation of 10 new member states in the European Community). It is also fundamental to assign dates to the ontology and to the rules, , based on an analytical approach, to the text, and analyze the relationships among sets of dates. The semantic web cake recommends that content, metadata should be modelled and represented in separate and clean layers. This recommendation is not widely followed from too many XML schemas, including those in the legal domain. The layers of content and rules are often confused to pursue a short annotation syntax, or procedural performance parameters or simply because a neat analysis of the semantic and abstract components is missing.

Not being mindful of time, of the effective date of changes to laws, the dates of events/transactions, can be hazardous to your pocketbook and/or your freedom!

Does your topic map account for time or does it exist in an eternal “now?” like the WWW?

I first saw this at Legal Informatics.

OASIS LegalRuleML

Filed under: Legal Informatics,LegalRuleML,RuleML — Patrick Durusau @ 2:44 pm

OASIS LegalRuleML

From the webpage:

The OASIS LegalRuleML TC defines a rule interchange language for the legal domain. The work enables modeling and reasoning that allows implementers to structure, evaluate, and compare legal arguments constructed using the rule representation tools provided.

Legal Informatics posted a notice of a new tutorial introduction to LegalRuleML.

If you are planning IT or semantic integration projects in legal circles, worth your while to take a look at LegalRuleML.

August 11, 2012

Lima on Visualization and Legislative Memory of the Brazilian Civil Code

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 6:28 pm

Lima on Visualization and Legislative Memory of the Brazilian Civil Code

Legal Informatics report the publication of the legislative history of the Brazilian Civil Code and a visualization of the Brazilian Civil Code.

Tying in Planiol’s Treatise on Civil Law (or other commentators) to such resources would make a nice showcase for topic maps.

August 8, 2012

GitLaw in Germany

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 1:51 pm

GitLaw in Germany: Deutsche Bundesgesetze- und verordnungen im Markdown auf GitHub = German Federal Laws and Regulations in Markdown on GitHub

Legal Informatics reports that German Federal Laws and Regulations are available in Markdown.

A useful resource if you have legal resources to make good use of it.

I would not advise self-help based on a Google translation of any of these materials.

August 1, 2012

Updated: Lists of Legal Metadata and Legal Knowledge Representation Resources

Filed under: Law,Legal Informatics — Patrick Durusau @ 7:33 pm

Updated: Lists of Legal Metadata and Legal Knowledge Representation Resources

Updated resource lists for anyone interested in legal informatics.

July 25, 2012

The Case for Curation: The Relevance of Digest and Citator Results in Westlaw and Lexis

Filed under: Aggregation,Curation,Legal Informatics,LexisNexis,Westlaw — Patrick Durusau @ 6:51 pm

The Case for Curation: The Relevance of Digest and Citator Results in Westlaw and Lexis by Susan Nevelow Mart and Jeffrey Luftig.

Abstract:

Humans and machines are both involved in the creation of legal research resources. For legal information retrieval systems, the human-curated finding aid is being overtaken by the computer algorithm. But human-curated finding aids still exist. One of them is the West Key Number system. The Key Number system’s headnote classification of case law, started back in the nineteenth century, was and is the creation of humans. The retrospective headnote classification of the cases in Lexis’s case databases, started in 1999, was created primarily although not exclusively with computer algorithms. So how do these two very different systems deal with a similar headnote from the same case, when they link the headnote to the digesting and citator functions in their respective databases? This paper continues an investigation into this question, looking at the relevance of results from digest and citator search run on matching headnotes in ninety important federal and state cases, to see how each performs. For digests, where the results are curated – where a human has made a judgment about the meaning of a case and placed it in a classification system – humans still have an advantage. For citators, where algorithm is battling algorithm to find relevant results, it is a matter of the better algorithm winning. But no one algorithm is doing a very good job of finding all the relevant results; the overlap between the two citator systems is not that large. The lesson for researchers: know how your legal research system was created, what involvement, if any, humans had in the curation of the system, and what a researcher can and cannot expect from the system you are using.

A must read for library students and legal researchers.

For legal research, the authors conclude:

The intervention of humans as curators in online environments is being recognized as a way to add value to an algorithm’s results, in legal research tools as well as web-based applications in other areas. Humans still have an edge in predicting which cases are relevant. And the intersection of human curation and algorithmically-generated data sets is already well underway. More curation will improve the quality of results in legal research tools, and most particularly can be used to address the algorithmic deficit that still seems to exist where analogic reasoning is needed. So for legal research, there is a case for curation. [footnotes omitted]

The distinction between curation, human gathering of relevant material and aggregation, machine gathering of potentially relevant material looks quite useful.

Curation anyone?

I first saw this at Legal Informatics.

July 20, 2012

Technology-Assisted Review Boosted in TREC 2011 Results

Filed under: Document Classification,Legal Informatics,Searching — Patrick Durusau @ 2:48 pm

Technology-Assisted Review Boosted in TREC 2011 Results by Evan Koblentz.

From the post:

TREC Legal Track, an annual government-sponsored project for evaluating document review methods, on Friday released its 2011 results containing a virtual vote of confidence for technology-assisted review.

“[T]he results show that the technology-assisted review efforts of several participants achieve recall scores that are about as high as might reasonably be measured using current evaluation methodologies. These efforts require human review of only a fraction of the entire collection, with the consequence that they are far more cost-effective than manual review,” the report states.

The term “technology-assisted review” refers to “any semi-automated process in which a human codes documents as relevant or not, and the system uses that information to code or prioritize further documents,” said TREC co-leader Gordon Cormack, of the University of Waterloo. Its meaning is far wider than just the software method known as predictive coding, he noted.

As such, “There is still plenty of room for improvement in the efficiency and effectiveness of technology-assisted review efforts, and, in particular, the accuracy of intra-review recall estimation tools, so as to support a reasonable decision that ‘enough is enough’ and to declare the review complete. Commensurate with improvements in review efficiency and effectiveness is the need for improved external evaluation methodologies,” the report states.

Good snapshot of current results, plus fertile data sets for testing alternative methodologies.

The report mentions the 100 GB data set size was a problem for some participants? (Overview of the TREC 2011 Legal Track, page 2)

Suggestion: Post the 2013 data set as a public data set to AWS. Would be available to everyone and if not using local clusters, they can fire up capacity on demand. More realistic scenario than local data processing.

Perhaps an informal survey of the amortized cost of processing by different methods (cloud, local cluster) would be of interest to the legal community.

I can hear the claims of “security, security” from here. The question to ask is: What disclosed premium your client is willing to pay for security on data you are going to give to the other side if responsive and non-privileged? 25% 50% 125% or more?

BTW, looking forward to the 2013 competition. Particularly if it gets posted to the AWS or similar cloud.

Let me know if you are interested in forming an ad hoc team or investigating the potential for an ad hoc team.

July 17, 2012

Searching Legal Information in Multiple Asian Languages

Filed under: Law,Legal Informatics,Search Engines — Patrick Durusau @ 2:42 pm

Searching Legal Information in Multiple Asian Languages by Philip Chung, Andrew Mowbray, and Graham Greenleaf.

Abstract:

In this article the Co-Directors of the Australasian Legal Information Institute (AustLII) explain the need for an open source search engine which can search simultaneously over legal materials in European languages and also in Asian languages, particularly those that require a ‘double byte’ representation, and the difficulties this task presents. A solution is proposed, the ‘u16a’ modifications to AustLII’s open source search engine (Sino) which is used by many legal information institutes. Two implementations of the Sino u16A approach, on the Hong Kong Legal Information Institute (HKLII), for English and Chinese, and on the Asian Legal Information Institute (AsianLII), for multiple Asian languages, are described. The implementations have been successful, though many challenges (discussed briefly) remain before this approach will provide a full multi-lingual search facility.

If the normal run of legal information retrieval, across jurisdictions, vocabularies, etc. challenging enough, you can try your hand at cross-language retrieval with European and Asian languages, plus synonyms, etc.

😉

I would like to think the synonymy issue, which is noted as open by this paper, could be addressed in part through the use of topic maps. It would be an evolutionary solution, to be updated as our use and understanding of language evolves.

Any thoughts on Sino versus Lucene/Solr 4.0 (alpha I know but it won’t stay that way forever).

I first saw this at Legal Informatics.

Proposed urn:lex codes for US materials in MLZ

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 2:25 pm

Proposed urn:lex codes for US materials in MLZ

From the post:

The MLZ styles rely on a urn:lex-like scheme for specifying the jurisdiction of primary legal materials. We will need to have at least a minimal set of jurisdction codes in place for the styles to be functional. The scheme to be used for this purpose is the subject of this post.

The urn:lex scheme is used in MLZ for the limited purpose of identifying jurisdictional scope: it is not a full document identifier, and does not carry information on the issuing institution itself. Even within this limited scope, the MLZ scheme diverges from the examples provided by the Cornell LII Lexcraft pages, in that the “federal” level is expressed as a geographic scope (set off by a semicolon), rather than as a distinct category of jurisdiction (appended by a period).

Unfortunate software isn’t designed to use existing identification systems.

On the other hand, computer identification systems started when computers were even dumber than they are now. Legacy issue I suppose.

If you are interested in “additional” legal identifier systems, or in the systems that use them, this should be of interest.

Or if you need to map such urn:lex codes to existing identifiers for the same materials. The ones used by people.

I first saw this at Legal Informatics.

July 6, 2012

Updated: List of Legal Informatics Courses, Programs, Departments, and Research Centers

Filed under: Legal Informatics — Patrick Durusau @ 6:51 pm

Updated: List of Legal Informatics Courses, Programs, Departments, and Research Centers.

Legal Informatics has updated it listing of legal informatics resources. Just in case you are spending a summer weekend updating your resource lists. 😉

« Newer PostsOlder Posts »

Powered by WordPress