Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 17, 2012

Scalia and Garner on legal interpretation

Filed under: Language,Law — Patrick Durusau @ 4:50 pm

Scalia and Garner on legal interpretation by Mark Liberman.

Mark writes:

Antonin Scalia and Bryan Garner have recently (June 19) published Reading Law: The Interpretation of Legal Texts, a 608-page work in which, according to the publisher’s blurb, “all the most important principles of constitutional, statutory, and contractual interpretation are systematically explained”.

The post is full of pointers to additional materials both on this publication and notions of legal interpretation more generally.

A glimpse of why I think texts are so complex.

BTW, for the record, I disagree with both Scalia and the post-9/11 Stanley Fish on discovering the “meaning” of texts or authors, respectively. We can report our interpretation of a text, but that isn’t the same thing.

An interpretation is a report we may persuade others to be useful for some purpose, agreeable with their prior beliefs or even consistent with their world view. But for all of that, it remains always our report, nothing more.

The claim of “plain meaning” of words or the “intention” of an author (Scalia, Fish respectively) is an attempt to either avoid moral responsibility for a report or to privilege a report as being more than simply another report. Neither one is particularly honest or useful.

In a marketplace of reports, acknowledged to be reports, we can evaluate, investigate, debate and even choose from among reports.

Scalia and Fish would both advantage some reports over others, probably for different reasons. But whatever their reasons, fair or foul, I prefer to meet all reports on even ground.

Searching Legal Information in Multiple Asian Languages

Filed under: Law,Legal Informatics,Search Engines — Patrick Durusau @ 2:42 pm

Searching Legal Information in Multiple Asian Languages by Philip Chung, Andrew Mowbray, and Graham Greenleaf.

Abstract:

In this article the Co-Directors of the Australasian Legal Information Institute (AustLII) explain the need for an open source search engine which can search simultaneously over legal materials in European languages and also in Asian languages, particularly those that require a ‘double byte’ representation, and the difficulties this task presents. A solution is proposed, the ‘u16a’ modifications to AustLII’s open source search engine (Sino) which is used by many legal information institutes. Two implementations of the Sino u16A approach, on the Hong Kong Legal Information Institute (HKLII), for English and Chinese, and on the Asian Legal Information Institute (AsianLII), for multiple Asian languages, are described. The implementations have been successful, though many challenges (discussed briefly) remain before this approach will provide a full multi-lingual search facility.

If the normal run of legal information retrieval, across jurisdictions, vocabularies, etc. challenging enough, you can try your hand at cross-language retrieval with European and Asian languages, plus synonyms, etc.

😉

I would like to think the synonymy issue, which is noted as open by this paper, could be addressed in part through the use of topic maps. It would be an evolutionary solution, to be updated as our use and understanding of language evolves.

Any thoughts on Sino versus Lucene/Solr 4.0 (alpha I know but it won’t stay that way forever).

I first saw this at Legal Informatics.

Proposed urn:lex codes for US materials in MLZ

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 2:25 pm

Proposed urn:lex codes for US materials in MLZ

From the post:

The MLZ styles rely on a urn:lex-like scheme for specifying the jurisdiction of primary legal materials. We will need to have at least a minimal set of jurisdction codes in place for the styles to be functional. The scheme to be used for this purpose is the subject of this post.

The urn:lex scheme is used in MLZ for the limited purpose of identifying jurisdictional scope: it is not a full document identifier, and does not carry information on the issuing institution itself. Even within this limited scope, the MLZ scheme diverges from the examples provided by the Cornell LII Lexcraft pages, in that the “federal” level is expressed as a geographic scope (set off by a semicolon), rather than as a distinct category of jurisdiction (appended by a period).

Unfortunate software isn’t designed to use existing identification systems.

On the other hand, computer identification systems started when computers were even dumber than they are now. Legacy issue I suppose.

If you are interested in “additional” legal identifier systems, or in the systems that use them, this should be of interest.

Or if you need to map such urn:lex codes to existing identifiers for the same materials. The ones used by people.

I first saw this at Legal Informatics.

July 13, 2012

Day Two of a Predictive Coding Narrative: More Than A Random Stroll Down Memory Lane

Filed under: e-Discovery,Email,Law,Prediction,Predictive Analytics — Patrick Durusau @ 3:47 pm

Day Two of a Predictive Coding Narrative: More Than A Random Stroll Down Memory Lane by Ralph Losey.

From the post:

Day One of the search project ended when I completed review of the initial 1,507 machine-selected documents and initiated the machine learning. I mentioned in the Day One narrative that I would explain why the sample size was that high. I will begin with that explanation and then, with the help of William Webber, go deeper into math and statistical sampling than ever before. I will also give you the big picture of my review plan and search philosophy: its hybrid and multimodal. Some search experts disagree with my philosophy. They think I do not go far enough to fully embrace machine coding. They are wrong. I will explain why and rant on in defense of humanity. Only then will I conclude with the Day Two narrative.

More than you are probably going to want to know about sample sizes and their calculation but persevere until you get to the defense of humanity stuff. It is all quite good.

If I had to add a comment on the defense of humanity rant, it would be that machines have a flat view of documents and not the richly textured one of a human reader. While true that machines can rapidly compare document without tiring, they will miss an executive referring to a secretary as his “cupcake.” A reference that would jump out at a human reader. Same text, different result.

Perhaps because in one case the text is being scanned for tokens and in the other case it is being read.

Day One of a Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron

Filed under: Email,Law,Prediction,Predictive Analytics — Patrick Durusau @ 3:22 pm

Day One of a Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron by Ralph Losey.

The start of a series of posts on predictive coding and searching of the Enron emails by a lawyer. A legal perspective is important enough that I will be posting a note about each post in this series as they occur.

A couple of preliminary notes:

I am sure this is the first time that Ralph has used predictive encoding with the Enron emails. On the other hand, I would not take “…this is the first time for X…” sort of claims from any vendor or service organization. 😉

You can see other examples of processing the Enron emails at:

And that is just a “lite” scan. There are numerous other projects that use the Enron email collection.

I wonder if that is because we are naturally nosey?

From the post:

This is the first in a series of narrative descriptions of a legal search project using predictive coding. Follow along while I search for evidence of involuntary employee terminations in a haystack of 699,082 Enron emails and attachments.

Joys and Risks of Being First

To the best of my knowledge, this writing project is another first. I do not think anyone has ever previously written a blow-by-blow, detailed description of a large legal search and review project of any kind, much less a predictive coding project. Experts on predictive coding speak only from a mile high perspective; never from the trenches (you can speculate why). That has been my practice here, until now, and also my practice when speaking about predictive coding on panels or in various types of conferences, workshops, and classes.

There are many good reasons for this, including the main one that lawyers cannot talk about their client’s business or information. That is why in order to do this I had to run an academic project and search and review the Enron data. Many people could do the same. In fact, each year the TREC Legal Track participants do similar search projects of Enron data. But still, no one has taken the time to describe the details of their search, not even the spacey TRECkies (sorry Jason).

A search project like this takes an enormous amount of time. In fact, to my knowledge (Maura, please correct me if I’m wrong), no Legal Track TRECkies have ever recorded and reported the time that they put into the project, although there are rumors. In my narrative I will report the amount of time that I put into the project on a day-by-day basis, and also, sometimes, on a per task basis. I am a lawyer. I live by the clock and have done so for thirty-two years. Time is important to me, even non-money time like this. There is also a not-insignificant amount of time it takes to write it up a narrative like this. I did not attempt to record that.

There is one final reason this has never been attempted before, and it is not trivial: the risks involved. Any narrator who publicly describes their search efforts assumes the risk of criticism from monday morning quarterbacks about how the sausage was made. I get that. I think I can handle the inevitable criticism. A quote that Jason R. Baron turned me on to a couple of years ago helps, the famous line from Theodore Roosevelt in his Man in the Arena speech at the Sorbonne:

It is not the critic who counts: not the man who points out how the strong man stumbles or where the doer of deeds could have done better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood, who strives valiantly, who errs and comes up short again and again, because there is no effort without error or shortcoming, but who knows the great enthusiasms, the great devotions, who spends himself for a worthy cause; who, at the best, knows, in the end, the triumph of high achievement, and who, at the worst, if he fails, at least he fails while daring greatly, so that his place shall never be with those cold and timid souls who knew neither victory nor defeat.

I know this narrative is no high achievement, but we all do what we can, and this seems within my marginal capacities.

July 9, 2012

Verrier: Visualizations of the French Civil Code

Filed under: Law,Law - Sources,Visualization — Patrick Durusau @ 10:42 am

Verrier: Visualizations of the French Civil Code

Legal Informatics reports in part:

Jacques Verrier has posted two visualizations of the French Code civil:

Code civil – Cartographie [a video showing the evolution of the Code civil by means of network graphs]
Code civil des Français [a network graph of the structure of the Code civil linked to the full text of the code]

Being from the only “civilian” jurisdiction in the United States (Louisiana) and having practiced law there, I had to include this as a resource. In addition to it being a good illustration of visualizing important subject matter.

The post also points to a variety of visualizations of the United States case and statutory law.

June 29, 2012

Bruce: How Well Does Current Legislative Identifier Practice Measure Up?

Filed under: Identifiers,Law,Law - Sources,Legal Informatics — Patrick Durusau @ 3:15 pm

Bruce: How Well Does Current Legislative Identifier Practice Measure Up?

From Legal Informatics:

Tom Bruce of the Legal Information Institute at Cornell University Law School (LII) has posted Identifiers, Part 3: How Well Does Current Practice Measure Up?, on LII’s new legislative metadata blog, Making Metasausage.

In this post, Tom surveys legislative identifier systems currently in use. He recommends the use of URIs for legislative identifiers, rather than URLs or URNs.

He cites favorably the URI-based identifier system that John Sheridan and Dr. Jeni Tennison developed for the Legislation.gov.uk system. Tom praises Sheridan’s (here) and Tennison’s (here and here) writings on legislative URIs and Linked Data.

Tom also praises the URI system implemented by Dr. Rinke Hoekstra in the Leibniz Center for Law‘s Metalex Document Server for facilitating point-in-time as well as point-in-process identification of legislation.

Tom concludes by making a series of recommendations for a legislative identifier system:

See the post for his recommendations (in case you are working on such a system) and for other links.

I would point out that existing legislation has identifiers from before it receives the “better” identifiers specified here.

And those “old” identifiers will have been incorporated into other texts, legal decisions and the like.

Oh.

We can’t re-write existing identifiers so it’s a good thing topic maps accept subjects having identifiers, plural.

June 28, 2012

A Simple URL Shortener for Legal Materials: L4w.us, by Ontolawgy

Filed under: Law,Law - Sources — Patrick Durusau @ 6:31 pm

A Simple URL Shortener for Legal Materials: L4w.us, by Ontolawgy

Legal Informatics reports on a URL shortener for legal citations.

Not exactly the usual URL shortener, it produces a “human readable” URL for U.S. Congress, U.S. Public Laws, the U.S. Code, and the Federal Register citations.

Compare:

U.S. Public Law 111-148
http://www.gpo.gov/fdsys/pkg/PLAW-111publ148/html/PLAW-111publ148.htm

versus

For a plain text version: http://L4w.us/PublicLaw/text/111-148
or http://L4w.us/Pub. L. 111-148 text

Human readable citation practices existed at the time of the design of URLs. Another missed opportunity that we are still paying for.

June 27, 2012

An API for European Union legislation

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 1:51 pm

An API for European Union legislation

From the webpage:

The API can help you conduct research, create data visualizations or you can even build applications upon it.

This is an application programming interface (API) that opens up core EU legislative data for further use. The interface uses JSON, meaning that you have easy to use machine-readable access to meta data on European Union legislation. It will be useful if you want to use or analyze European Union legislative data in a way that the official databases are not originally build for. The API extracts, organize and connects data from various official sources.

Among other things we have used the data to conduct research on the decision-making time*, analyze voting patterns*, measure the activity of Commissioners* and visualize the legislative integration process over time*, but you can use the API as you want to. When you use it to create something useful or interesting be sure to let us know, if you want to we can post a link to your project from this site.

For some non-apparent reason, the last paragraph has hyperlinks for the “*” characters. So that is not a typo, that is how it appears in the original text.

There are a large number of relationships captured by data accessible through this API. The sort of relationships that topic maps excel at handling.

I first saw this at: DZone: An API for European Union legislation

Predictive Coding Patented, E-Discovery World Gets Jealous

Filed under: e-Discovery,Law,Predictive Analytics — Patrick Durusau @ 12:48 pm

Predictive Coding Patented, E-Discovery World Gets Jealous by Christopher Danzig

From the post:

The normally tepid e-discovery world felt a little extra heat of competition yesterday. Recommind, one of the larger e-discovery vendors, announced Wednesday that it was issued a patent on predictive coding (which Gabe Acevedo, writing in these pages, named the Big Legal Technology Buzzword of 2011).

In a nutshell, predictive coding is a relatively new technology that allows large chunks of document review to be automated, a.k.a. done mostly by computers, with less need for human management.

Some of Recommind’s competitors were not happy about the news. See how they responded (grumpily), and check out what Recommind’s General Counsel had to say about what this means for everyone who uses e-discovery products….

Predictive coding has received a lot of coverage recently as a new way to save buckets of money during document review (a seriously expensive endeavor, for anyone who just returned to Earth).

I am always curious why a patent or even patent number will be cited but no link to the patent given?

In case you are curious, it is patent 7,933,859, as a hyperlink.

The abstract reads:

Systems and methods for analyzing documents are provided herein. A plurality of documents and user input are received via a computing device. The user input includes hard coding of a subset of the plurality of documents, based on an identified subject or category. Instructions stored in memory are executed by a processor to generate an initial control set, analyze the initial control set to determine at least one seed set parameter, automatically code a first portion of the plurality of documents based on the initial control set and the seed set parameter associated with the identified subject or category, analyze the first portion of the plurality of documents by applying an adaptive identification cycle, and retrieve a second portion of the plurality of documents based on a result of the application of the adaptive identification cycle test on the first portion of the plurality of documents.

If that sounds familiar to you, you are not alone.

Predictive coding, developed over the last forty years, is an excellent feed into a topic map. As a matter of fact, it isn’t hard to imagine a topic map seeding and being augmented by a predictive coding process.

I also mention it as a caution that the IP in this area, as in many others, is beset by the ordinary being approved as innovation.

A topic map would be ideal to trace claims, prior art and to attach analysis to a patent. I saw several patents assigned to Recommind and some pending applications. When I have a moment I will post a listing with links to those documents.

I first saw this at Beyond Search.

June 24, 2012

Report of Second Phase of Seventh Circuit eDiscovery Pilot Program

Filed under: Law,Legal Informatics — Patrick Durusau @ 3:42 pm

Report of Second Phase of Seventh Circuit eDiscovery Pilot Program Published

From Legal Informatics:

The Seventh Circuit Electronic Discovery Pilot Program has published its Final Report on Phase Two, May 2010 to May 2012 (very large PDF file).

A principal purpose of the program is to determine the effects of the use of Principles Relating to the Discovery of Electronically Stored Information in litigation in the Circuit.

The report describes the results of surveys of lawyers who participated in efiling in the Seventh Circuit, and of judges and lawyers who participated in trials in which the Circuit’s Principles Relating to the Discovery of Electronically Stored Information were applied.

True enough, the report is “a very large PDF file.” At 969 pages and 111.5 MB. Don’t try downloading while you are on the road, unless you are in South Korea or Japan.

I don’t have the time today but the report isn’t substantively 969 pages long. Pages of names and addresses, committee minutes, presentations, filler of various kinds. If you find it other than in PDF format, I might be interested in generating a shorter version that might be of more interest.

Bottom line was that cooperation in discovery as it relates to electronically stored information reduces costs and yet maintains standards for representation.

Topic maps can play an important role both in eDiscovery but in relating information together, whatever its original form.

True enough, there are services that perform those functions now, but have you ever taken one of their work products and merged it with another?

By habit or chance, the terms used may be close enough to provide a useful result, but how do you verify the results?

June 23, 2012

Fastcase Introduces e-Books, Beginning with Advance Sheets

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 4:07 pm

Fastcase Introduces e-Books, Beginning with Advance Sheets

From the post:

According to the Fastcase blog post, Fastcase advance sheets will be available “for each state, federal circuit, and U.S. Supreme Court”; will be free of charge and “licensed under [a] Creative Commons BY-SA license“; and will include summaries. Each e-Book Advance Sheet will contain “one month’s judicial opinions (designated as published and unpublished) for specific states or courts.”

According to Sean Doherty’s post, future Fastcase e-Books will include “e-book case reporters with official pagination and links” into the Fastcase database, as well as “topical reporters” on U.S. law, covering fields such as securities law and antitrust law.

According to the Fastcase blog post, Fastcase’s approach to e-Books is inspired in part by CALI‘s Free Law Reporter, which makes case law available as e-Books in EPUB format.

For details, see the links in the post at Legal Informatics.

I mention it because not only could you have “topical reporters” but information products that are tied to even narrower areas of case law.

Such as litigation that a firm has pending or very narrow areas of liability (for example) of interest to a particular client. Granting there are “case watch” resources in every trade zine, but hardly detailed enough to do more than “excite the base” as they say.

With curated content from a topic map application, rather than “exciting the base,” you could be sharpening the legal resources you can whistle up on behalf of your client. Increasing their appreciate and continued interest in representation by you.

June 12, 2012

how much of commonsense and legal reasoning is formalizable? A review of conceptual obstacles

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 2:19 pm

how much of commonsense and legal reasoning is formalizable? A review of conceptual obstacles by Jame Franklin.

Abstract:

Fifty years of effort in artificial intelligence (AI) and the formalization of legal reasoning have produced both successes and failures. Considerable success in organizing and displaying evidence and its interrelationships has been accompanied by failure to achieve the original ambition of AI as applied to law: fully automated legal decision-making. The obstacles to formalizing legal reasoning have proved to be the same ones that make the formalization of commonsense reasoning so difficult, and are most evident where legal reasoning has to meld with the vast web of ordinary human knowledge of the world. Underlying many of the problems is the mismatch between the discreteness of symbol manipulation and the continuous nature of imprecise natural language, of degrees of similarity and analogy, and of probabilities.

I haven’t (yet) been able to access a copy of this article.

From the abstract,

….mismatch between the discreteness of symbol manipulation and the continuous nature of imprecise natural language, of degrees of similarity and analogy, and of probabilities.

I suspect it will be useful reminder of the boundaries to formal information systems.

I first saw this at Legal Informatics: Franklin: How Much of Legal Reasoning Is Formalizable?

June 3, 2012

A Resource-Based Method for Named Entity Extraction and Classification

Filed under: Entities,Entity Extraction,Entity Resolution,Law,Named Entity Mining — Patrick Durusau @ 3:37 pm

A Resource-Based Method for Named Entity Extraction and Classification by Pablo Gamallo and Marcos Garcia. (Lecture Notes in Computer Science, vol. 7026, Springer-Verlag, 610-623. ISNN: 0302-9743).

Abstract:

We propose a resource-based Named Entity Classification (NEC) system, which combines named entity extraction with simple language-independent heuristics. Large lists (gazetteers) of named entities are automatically extracted making use of semi-structured information from the Wikipedia, namely infoboxes and category trees. Language independent heuristics are used to disambiguate and classify entities that have been already identified (or recognized) in text. We compare the performance of our resource-based system with that of a supervised NEC module implemented for the FreeLing suite, which was the winner system in CoNLL-2002 competition. Experiments were performed over Portuguese text corpora taking into account several domains and genres.

Of particular interest if you are interested in adding NEC resources to the FreeLing project.

The introduction starts off:

Named Entity Recognition and Classification (NERC) is the process of identifying and classifying proper names of people, organizations, locations, and other Named Entities (NEs) within text.

Curious, what happens if you don’t have a “named” entity? That is an entity mentioned in the text but that doesn’t (yet) have a proper name?

Thinking of legal texts where some provision may apply to all corporations that engage in activity Y and that have a gross annual income in excess of amount X.

I may want to “recognize” that entity so I can then put a name with that entity.

May 25, 2012

Bruce on Legislative Identifier Granularity

Filed under: Identifiers,Law,Law - Sources,Legal Informatics — Patrick Durusau @ 10:23 am

Bruce on Legislative Identifier Granularity

From the post:

In this post, Tom [Bruce] explores legislative identifier granularity, or the level of specificity at which such an identifier functions. The post discusses related issues such as the incorporation of semantics in identifiers; the use of “pure” (semantics-free) legislative identifiers; and how government agency authority and procedural rules influence the use, “persistence, and uniqueness” of identifiers. The latter discussion leads Tom to conclude that

a “gold standard” system of identifiers, specified and assigned by a relatively independent body, is needed at the core. That gold standard can then be extended via known, stable relationships with existing identifier systems, and designed for extensible use by others outside the immediate legislative community.

Interesting and useful reading.

Even though a “gold standard” of identifiers for something as dynamic as legislation, isn’t likely.

Or rather, isn’t going to happen.

There are too many stakeholders in present systems for any proposal to carry the day.

Not to mention decades, if not centuries, of references in other systems.

May 20, 2012

…Commenting on Legislation and Court Decisions

Filed under: Annotation,Law,Legal Informatics — Patrick Durusau @ 6:16 pm

Anderson Releases Prototype System Enabling Citizens to Comment on Legislation and Court Decisions

Legalinformatics brings news that:

Kerry Anderson of the African Legal Information Institute (AfricanLII) has released a prototype of a new software system enabling citizens to comment on legislation, regulations, and court decisions.

There are several initiatives like this one, which is encouraging from the perspective of crowd-sourcing data for annotation.

May 19, 2012

Hands-on examples of legal search

Filed under: e-Discovery,Law,Legal Informatics,Searching — Patrick Durusau @ 7:04 pm

Hands-on examples of legal search by Michael J. Bommarito II.

From the post:

I wanted to share with the group some of my recent work on search in the legal space. I have been developing products and service models, but I thought many of the experiences or guides could be useful to you. I would love to share some of this work to help foster a “hacker” community in which we might collaborate on projects.

The first few posts are based on Amazon’s CloudSearch service. CloudSearch, as the name suggests, is a “cloud-based” search service. Once you decide what and how you would like to search, Amazon handles procuring the underlying infrastructure, scaling to required capacity, stemming, stop-wording, building indices, etc. For those of you who do not have access to “search appliances” or labor to configure products like Solr, this offers an excellent opportunity.

Pointers to several posts by Michael that range from searching U.S. Supreme Court decisions, email archives, to statutory law.

From law to eDiscovery, something for everybody!

May 15, 2012

Electronic Discovery Institute

Filed under: Law,Legal Informatics — Patrick Durusau @ 2:03 pm

Electronic Discovery Institute

From the home page:

The Electronic Discovery Institute is a non-profit organization dedicated to resolving electronic discovery challenges by conducting studies of litigation processes that incorporate modern technologies. The explosion in volume of electronically stored information and the complexity of its discovery overwhelms the litigation process and the justice system. Technology and efficient processes can ease the impact of electronic discovery.

The Institute operates under the guidance of an independent Board of Diplomats comprised of judges, lawyers and technical experts. The Institute’s studies will measure the relative merits of new discovery technologies and methods. The results of the Institute’s studies will be shared with the public free of charge. In order to obtain our free publications, you must create a free log-in with a legitimate user profile. We do not sell your information. Please visit our sponsors – as they provide altruistic support to our organization.

I encountered the Electronic Discovery Institute while researching information on electronic discovery. Since law was and still is an interest of mine, wanted to record it here.

The area of e-discovery is under rapid development, in terms rules that govern it, the technology that it employs and its practice in real world situations with consequences for the players.

Commend this site/organization to anyone interested in e-discovery issues.

May 9, 2012

Crowdsourced Legal Case Annotation

Filed under: Annotation,Law,Law - Sources,Legal Informatics — Patrick Durusau @ 12:38 pm

Crowdsourced Legal Case Annotation

From the post:

This is an academic research study on legal informatics (information processing of the law). The study uses an online, collaborative tool to crowdsource the annotation of legal cases. The task is similar to legal professionals’ annotation of cases. The result will be a public corpus of searchable, richly annotated legal cases that can be further processed, analysed, or queried for conceptual annotations.

Adam and Wim are computer scientists who are interested in language, law, and the Internet.

We are inviting people to participate in this collaborative task. This is a beta version of the exercise, and we welcome comments on how to improve it. Please read through this blog post, look at the video, and get in contact.

Non-trivial annotation of complex source documents.

What you do with the annotations, such as create topic maps, etc. would be a separate step.

The early evidence for the enhancement of our own work, based on the work of others, Picking the Brains of Strangers…, should make this approach even more exciting.

PS: I saw this at Legal Informatics but wanted to point you directly to the source article.
Just musing for a moment but what if the conclusion on collaboration and access is that by restricting access we impoverish not only others, but ourselves as well?

Bruce on the Functions of Legislative Identifiers

Filed under: Identifiers,Law,Law - Sources,Legal Informatics — Patrick Durusau @ 12:06 pm

Bruce on the Functions of Legislative Identifiers

From Legal Informatics:

In this post, Tom [Bruce] discusses the multiple functions that legislative document identifiers serve. These include “unique naming,” “navigational reference,” “retrieval hook / container label,” “thread tag / associative marker,” “process milestone,” and several more.

A promised second post will examine issues of identifier design.

Enjoy and pass along!

May 8, 2012

@Zotero 4 Law and OpenCongress.org

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 3:39 pm

@Zotero 4 Law and OpenCongress.org

I don’t suppose one more legal resource from Legal Informatics for today will hurt anything. 😉

A post on MLZ (Multilingual Zotero), a legal research and citation processor. Operates as a plugin to Firefox.

Even if you don’t visit the original post, do watch the video on using MLZ. Not slick but you will see the potential that it offers.

It should also give you some ideas about user friendly interfaces and custom topic map applications.

New Version of Code of Federal Regulations Launched by Cornell LII

Filed under: Law,Law - Sources,Linked Data — Patrick Durusau @ 2:40 pm

New Version of Code of Federal Regulations Launched by Cornell LII

From Legal Informatics, news of improved access to the Code of Federal Regulations.

US Government site: Code of Federal Regulations.

Cornell LII site: Code of Federal Regulations

You tell me, which one do you like better?

Note that the Government Printing Office (GPO, originator of the “official” version), Cornell LII and the Cornell Law Library have been collaborating for the last two years to make this possible.

The Legal Informatics post has a summary of the new features. You won’t gain anything from my repeating them.

Cornell LII plans on using Linked Data so you can link into the site.

Being able to link into this rich resource will definitely be a boon to other legal resource sites and topic maps. (Despite the limitations of linked data.)

The complete announcement can be found here.

PS: Donate to support the Cornell LII project.

Mill: US Code Citation Extraction Library in JavaScript, with Node API

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 10:51 am

Mill: US Code Citation Extraction Library in JavaScript, with Node API

Legal Informatics brings news of new scripts by Eric Mill of Sunlight Labs to extract US Code citations in texts.

Legal citations being a popular means of identifying laws, these would be of interest for law related topic maps.

Monique da Silva Moore, et al. v. Publicis Group SA, et al, 11 Civ. 1279

Filed under: e-Discovery,Law,Legal Informatics — Patrick Durusau @ 10:44 am

Monique da Silva Moore, et al. v. Publicis Group SA, et al, 11 Civ. 1279

The foregoing link is something of a novelty. It is a link to the opinion by US Magistrate Andrew Peck, approving the use of predictive coding (computer-assisted review) as part of e-discovery.

It is not a pointer to an article with no link to the opinion. It is not a pointer to an article on the district judge’s opinion, upholding the magistrate’s order but adding nothing of substance on the use of predictive coding. It is not a pointer to a law journal that requires “free” registration.

I think readers have a reasonable expectation that articles contain pointers to primary source materials. Otherwise, why not write for the tabloids?

Sorry, I just get enraged when resources do not point to primary sources. Not only is it poor writing, it is discourteous to readers.

Magistrate Peck’s opinion is said to be the first that approves the use of predictive coding as part of e-discovery.

In very summary form, the plaintiff (the person suing) has requested the defendant (the person being sued), produce documents, including emails, in its possession that are responsive to a discovery request. A discovery request is where the plaintiff specifies what documents it wants the defendant to produce, usually described as a member of a class of documents. For example, all documents with statements about [plaintiff’s name] employment with X, prior to N date.

In this case, there are 3 million emails to be searched and then reviewed by the defense lawyers (for claims of privilege, non-disclosure authorized by law, such as advice of counsel in some cases) prior to production for review by the plaintiff, who may then use one or more of the emails at trial.

The question is: Should the defense lawyers use a few thousand documents to train a computer to search the 3 million documents or should they use other methods, which will result in much higher costs because lawyers have to review more documents?

The law, facts and e-discovery issues weave in and out of Magistrate Peck’s decision but if you ignore the obviously legalese parts you will get the gist of what is being said. (If you have e-discovery issues, please seek professional assistance.)

I think topic maps could be very relevant in this situation because subjects permeate the discovery process, under different names and perspectives, to say nothing of sharing analysis and data with co-counsel.

I am also mindful that analysis of presentations, speeches, written documents, emails, discovery from other cases, could well develop profiles of potential witnesses in business litigation in particular. A topic map could be quite useful in mapping the terminology most likely to be used by a particular defendant.

BTW, it will be a long time coming, in part because it would reduce the fees of the defense bar, but I would say, “OK, here are the 3 million emails. We reserve the right to move to exclude any on the basis of privilege, relevancy, etc.”

That ends all the dancing around about discovery and if the plaintiff wants to slough through 3 million emails, fine. They still have to disclose what they intend to produce as exhibits at trial.

May 5, 2012

Building a Web-Based Legislative Editor

Filed under: Editor,Law — Patrick Durusau @ 6:56 pm

Building a Web-Based Legislative Editor by Grant Vergottini.

From the post:

I built the legislative drafting tool used by the Office of the Legislative Counsel in California. It was a long and arduous process and took several years to complete. Issues like redlining, page & line numbers, and the complexities of tables really turned an effort that, while looking quite simple at the surface, into a very difficult task. We used XMetaL as the base tool and customized it from there, developing what has to be the most sophisticated implementation of XMetaL out there. We even had to have a special API added to XMetaL to allow us to drive the change tracking mechanism to support the very specialized redlining needs one finds in legislation.

…With HTML5, it is now possible to build a full fledged browser-based legislative editor. For the past few months I have been building a prototype legislative editor in HTML5 that uses Akoma Ntoso as its XML schema. The results have been most gratifying. Certainly, building such an editor is no easy task. Having been working in this subject for 10 years now I have all the issues well internalized and can navigate the difficulties that arise. But I have come a long way towards achieving the holy grail of legislative editors – a web-based, standards-based, browser-neutral solution.

Not even out in beta, yet, but a promising report from someone who knows the ends and outs of legislation editors.

Why is that relevant for topic maps?

A web-based editor could, not necessarily will, lead to custom editors that are configured for work flows in the production of topic map work products.

If you think about it, we interact with work flows based on our recognition of subjects and taking actions based on the subjects we recognize.

Not a big step for software to record which subjects we have recognized, while our machinery silently adds identifiers, updates indexes of associations and performs other tasks.

PS: I originally saw this mentioned at the Legal Informatics blog.

May 3, 2012

Argumentation 2012

Filed under: Conferences,Law,Semantic Diversity — Patrick Durusau @ 6:24 pm

Argumentation 2012: International Conference on Alternative Methods of Argumentation in Law


07-09-2012 Full paper submission deadline

21-09-2012 Notice of acceptance deadline

12-10-2012 Paper camera-ready deadline

26-10-2012 Main event, Masaryk University in Brno, Czech Republic

From the listing of topics for papers, semantic diversity going to run riot at this conference.

Checking around the website I was disappointed the papers from Argumentation 2011 are not online.

April 29, 2012

Legal Entity Identifier – Preparing for the Inevitable

Filed under: Identifiers,Law,Legal Entity Identifier (LEI),Legal Informatics — Patrick Durusau @ 2:04 pm

Legal Entity Identifier – Preparing for the Inevitable by Peter Ku.

From the post:

Most of the buzz around the water cooler for those responsible for enterprise reference data in financial services has been around the recent G20 meeting in Switzerland on the details of the proposed Legal Entity Identifier (LEI). The LEI is designed to help regulators manage and monitor systemic risk in the financial markets by creating a unique ID to recognize legal entities/counterparties shared by the global financial companies and government regulators. Agreement to adoption is expected to be decided at the G20 leaders’ summit coming up in June in Mexico as regulators decide the details as to the administration, implementation and enforcement of the standard. Will the new LEI solve the issues that led to the recent financial crisis?

Looking back at history, this is not the first time the financial industry has attempted to create a unique ID system for legal entities, remember the Data Universal Numbering System (DUNS) identifier as an example? What is different from the past is that the new LEI standard is set at a global vs. regional level which had caused past attempts to fail. Unfortunately, the LEI standard will not replace existing IDs that firms deal with every day. Instead, it creates further challenges requiring companies to map existing IDs to the new LEI, reconciling naming differences, maintain legal hierarchy relationships between parent and subsidiary entities from ongoing corporate actions, and also link it to the securities and loans to the legal entities.

….

While many within the industry are waiting to see what the regulators decide in June, existing issues related to the quality, consistency, and delivery of counterparty reference data and the downstream impact on managing risk needs to be dealt with regardless if LEI is passed. In the same report, I shared the challenges firms will face incorporating the LEI including:

  • Accessing, reconciling, and relating existing counterparty information and IDs to the new LEI
  • Effectively identifying and resolving data quality issues from external and internal systems
  • Accurately identifying legal hierarchy relationships which LEI will not maintain in its first instantiation.
  • Cross referencing legal entities with financial and securities instruments
  • Extending both counterparty and securities instruments to downstream front, mid, and back office systems.

As a topic map person, do any of these issues sound familiar to you?

In particular creating a new identifier to solve problems with resolving multiple “old” ones?

Being mindful that all data systems are capable of and/or contain errors, intentional (dishonest) and otherwise.

Presuming perfect records, and perfect data in those records, not only guarantees failure, but avenues for abuse.

Peter cites resources you will need to read.

April 28, 2012

…such as the eXtensible Business Reporting Language (XBRL).

Filed under: Law,Law - Sources,Taxonomy,XBRL — Patrick Durusau @ 6:06 pm

Now there is a shout-out! Better than Steve Cobert or Jon Steward? Possibly, possibly. 😉

Where? The DATA act, recently passed by the House of Representatives (US), reads in part:

EXISTING DATA REPORTING STANDARDS.—In designating reporting standards under this subsection, the Commission shall, to the extent practicable, incorporate existing nonproprietary standards, such as the eXtensible Business Reporting Language (XBRL). [Title 31, Section 3611(b)(3). Doesn’t really roll off the tongue does it?]

No guarantees but what do you think the odds are that XBRL will be used by the commission? (That’s what I thought.)

With that in mind:

XBRL

Homepage for XBRL.org and apparently the starting point for all things XBRL. You will find the specifications, taxonomies, best practices and other materials on XBRL.

Enough reading material to keep you busy while waiting for organizations to adopt or to be required to adopt XBRL.

Topic maps are relevant to this transition for several reasons, among others:

  1. Some organizations will have legacy accounting systems that require mapping to XBRL.
  2. Even organizations that have transitioned to XBRL will have legacy data that has not.
  3. Transitions to XBRL by different organizations may not reflect the same underlying semantics.

April 27, 2012

Scout, in Open Beta

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 6:11 pm

Scout, in Open Beta

Eric Mill writes:

Scout is an alert system for the things you care about in state and national government. It covers Congress, regulations across the whole executive branch, and legislation in all 50 states.

You can set up notifications for new things that match keyword searches. Or, if you find a particular bill you want to keep up with, we can notify you whenever anything interesting happens to it — or is about to.

Just to emphasize, this is a beta – it functions well and looks good, but we’re really hoping to hear from the community on how we can make it stronger. You can give us feedback by using the Feedback link at the top of the site, or by writing directly to scout@sunlightfoundation.com.

Legal terminology variation between states plus the feds is going to make keyword searches iffy.

Will vary among areas of law.

Greatest variation in family and criminal law, least among some parts of commercial law.

Anyone know if there is a cross-index of terminology between the legal systems of the states?

April 26, 2012

CodeX: Standard Center for Legal Informatics

Filed under: Law,Legal Informatics — Patrick Durusau @ 6:30 pm

CodeX: Standard Center for Legal Informatics

Language and semantics are noticed more often with regard to legal systems than they are elsewhere. Failing to “get” a joke on television show doesn’t have the same consequences, potentially, as breaking a law.

Within legal systems topic maps are important for capturing and collating complex factual and legal semantics. As the world grows more international, legal system bump up against each other and topic maps provide a way to map across such systems.

From the website:

CodeX is a multidisciplinary laboratory operated by Stanford University in association with affiliated organizations from industry, government, and academia. The staff of the Center includes a core of full-time employees, together with faculty and students from Stanford and professionals from affiliated organizations.

CodeX’s primary mission is to explore ways in which information technology can be used to enhance the quality and efficiency of our legal system. Our goal is “legal technology” that empowers all parties in our legal system and not solely the legal profession. Such technology should help individuals find, understand, and comply with legal rules that govern their lives; it should help law-making bodies analyze proposed laws for cost, overlap, and inconsistency; and it should help enforcement authorities ensure compliance with the law.

Projects carried out under the CodeX umbrella typically fall into one or more of the following areas:

  • Legal Document Management: Legal Document Management:is concerned with the creation, storage, and retrieval of legal documents of all types, including statutes, case law, patents, regulations, etc. The $50B e-discovery market is heavily dependent on Information Retrieval (IR) technology. By automating information retrieval, cost can be dramatically reduced. Furthermore, it is generally the case that well-tuned automated procedures can outperform manual search in terms of accuracy. CodeX is investigating various innovative legal document management methodologies and helping to facilitate the use of such methods across the legal spectrum.
  • Legal Infrastructure: Some CodeX projects focus on building the systems that allow the stakeholders in the legal system to connect and collaborate more efficiently. Leveraging advances in the field of computer science and building upon national and international standardization efforts, these projects have the potential to provide economic and social benefits by streamlining the interactions of individuals, organizations, legal professionals and government as they acquire and deliver legal services. By combining the development of such platforms with multi-jurisdictional research on relevant regulations issued by governments and bar associations, the Center supports responsible, forward-looking innovation in the legal industry.
  • Computational Law: Computational law is an innovative approach to legal informatics based on the explicit representation of laws and regulations in computable form. Computational Law techniques can be used to “embed” the law in systems used by individuals and automate certain legal decision making processes or in the alternative bring the legal information as close to the human decision making as possible. The Center’s work in this area includes theoretical research on representations of legal information, the creation of technology for processing and utilizing information expressed within these representations, and the development of legal structures for ratifying and exploiting such technology. Initial applications include systems for helping individuals navigate contractual regimes and administrative procedures, within relatively discrete e-commerce and governmental domains.
« Newer PostsOlder Posts »

Powered by WordPress