Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 28, 2013

…if not incomprehensible to most citizens

Filed under: Law,Law - Sources — Patrick Durusau @ 6:33 pm

A Right to Access Implies A Right to Know: An Open Online Platform for Research on the Readability of Law by Michael Curtotti and Eric McCreath. (Journal of Open Access to Law, Vol. 1, No. 1)

Abstract:

The widespread availability of legal materials online has opened the law to a new and greatly expanded readership. These new readers need the law to be readable by them when they encounter it. However, the available empirical research supports a conclusion that legislation is difficult to read if not incomprehensible to most citizens. We review approaches that have been used to measure the readability of text including readability metrics, cloze testing and application of machine learning. We report the creation and testing of an open online platform for readability research. This platform is made available to researchers interested in undertaking research on the readability of legal materials. To demonstrate the capabilities of the platform, we report its initial application to a corpus of legislation. Linguistic characteristics are extracted using the platform and then used as input features for machine learning using the Weka package. Wide divergences are found between sentences in a corpus of legislation and those in a corpus of graded reading material or in the Brown corpus (a balanced corpus of English written genres). Readability metrics are found to be of little value in classifying sentences by grade reading level (noting that such metrics were not designed to be used with isolated sentences).

What I found troubling about this paper as its conjuring of a right to have the law (the text of the law) to be “reasonably accessible” to individuals:

Leaving aside the theoretical justifications that might be advanced to support this view, the axiomatic position taken by this paper is that all individuals subject to law are entitled to know its content and therefore to have it written in a way which is reasonably accessible to them. (pp. 6-7)

I don’t dispute that the law should be freely available to everyone, it is difficult to obey what isn’t at least potentially available.

But, the authors’ “reasonably accessible” argument fails in two ways.

First, the authors fail to define a level of readability that supports “reasonably accessible.” How much change is necessary to achieve “reasonably accessible?” At least the authors don’t know.

Second, the amount of necessary change must be known in order to judge the feasibility of any revisions to make the law “reasonably accessible.”

The U.S. Internal Revenue Code (herein IRC) is a complex body of work that is based on prior court decisions, rulings by the I.R.S. and a commonly understood vocabulary among tax experts. And it is legislation that touches many other laws and regulations, both at a federal and state level. All of which are interwoven with complex meanings established by years of law, regulation and custom.

Even creating a vulgar version of important legislation would depend upon identification of a complex of subjects and relationships that are explicit only to an expert reader. Doable, but it would never have the force of law.

I first saw this at: Curtotti and McCreath: An Open Online Platform for Research on the Readability of Law.

December 26, 2013

Legivoc – connecting laws in a changing world

Filed under: EU,Law,Semantics — Patrick Durusau @ 8:02 pm

Legivoc – connecting laws in a changing world by Hughes-Jehan Vibert, Pierre Jouvelot, Benoît Pin.

Abstract:

On the Internet, legal information is a sum of national laws. Even in a changing world, law is culturally specific (nation-specific most of the time) and legal concepts only become meaningful when put in the context of a particular legal system. Legivoc aims to be a semantic interface between the subject of law of a State and the other spaces of legal information that it will be led to use. This project will consist of setting up a server of multilingual legal vocabularies from the European Union Member States legal systems, which will be freely available, for other uses via an application programming interface (API).

And I thought linking all legal data together was ambitious!

So long as the EU was composed of civil law jurisdictions, I would not have taken odds on the success of the project but it could have some useful results.

One you add in common law jurisdictions like the United Kingdom, the project may still have some useful results but there isn’t going to be mapping across all the languages.

Part of the difficulty will be language but part of it will be at the most basic assumptions of both systems.

In civil law, the drafters of legal codes attempt to systematically set out a set of principles that take each other into account and represent a blueprint for an ordered society.

Common law, on the other hand, has at its core court decisions that determine the results between two parties. And those decisions can be relied upon by other parties.

Between civil and common law jurisdictions, some laws/concepts may be more mappable than others. Modern labor law for example, may be new enough for semantic accretions to not prevent a successful mapping.

Older laws, property and inheritance laws, for example, are usually the most unique for any jurisdiction. Those are likely to prove impossible to map or reconcile.

Still, it will be an interesting project, particularly if they disclose the basis for any possible mapping, as opposed to simply declaring a mapping.

Both would be useful, but the former robust in the face of changing law and the latter is brittle.

The Case for Linking World Law Data

Filed under: Law,Linked Data — Patrick Durusau @ 4:40 pm

The Case for Linking World Law Data by Sergio Puig and Enric G. Torrents.

Abstract:

The present paper advocates for the creation of a federated, hybrid database in the cloud, integrating law data from all available public sources in one single open access system – adding, in the process, relevant meta-data to the indexed documents, including the identification of social and semantic entities and the relationships between them, using linked open data techniques and standards such as RDF. Examples of potential benefits and applications of this approach are also provided, including, among others, experiences from of our previous research, in which data integration, graph databases and social and semantic networks analysis were used to identify power relations, litigation dynamics and cross-references patterns both intra and inter-institutionally, covering most of the World international economic courts.

From the conclusion:

We invite any individual and organization to join in and participate in this open endeavor, to shape together this project, Neocodex, aspiring to replicate the impact that Justinian’s Corpus Juris Civilis, the original Codex, had in the legal systems of the Early Middle Ages.

Yes, well, I can’t say the authors lack for ambition. 😉

As you know, the Corpus Juris Civilis has heavily influenced the majority of legal jurisdictions today. (Civil Law)

Do be mindful that the OASIS Legal Citation Markup (LegalCiteM) TC is having its organizational meeting on 12th February 2014, in case you are interested in yet another legal citation effort.

Why anyone thinks we need another legal citation system, that leaves the previous one on the cutting room floor, is beyond me.

Yes, a new legal citation system might be non-proprietary, royalty-free, web-based, etc., but without picking up current citation practices, it will also be dead on arrival (DOA).

December 24, 2013

A Salinas Card

Filed under: Government,Law — Patrick Durusau @ 1:51 pm

Salinas v. Texas

The Scotusblog has this summary of Salinas v. Texas:

When petitioner had not yet been placed in custody or received Miranda warnings, and voluntarily responded to some questions by police about a murder, the prosecution’s use of his silence in response to another question as evidence of his guilty at trial did not violate the Fifth Amendment because petitioner failed to expressly invoke his privilege not to incriminate himself in response to the officer’s question.

A lay translation: If the police ask you questions, before you have been arrested or read your rights, your silence can and will be used against you in court.

I could go on for thousands of words about why Salinas v. Texas was wrongly decided, but that won’t help in an interaction with the police.

I have a simpler and perhaps even effective course of action, the Salinas Card.

My name is: (insert your name).

I invoke my right against self-incrimination and refuse to answer any and all questions, verbal, written or otherwise communicated.

I invoke my right to counsel and cannot afford counsel. I request counsel be appointed and to be present for any questioning, lineups or other identification procedures, and/or any legal proceedings.

I do not consent to any searches of my person, my immediate surroundings or any vehicles or structures that I may own, rent or otherwise occupy.

Date: ___________________________
Police Officer

Get a local criminal defense attorney to approve the language for your state (some states have more protections than the U.S. Constitution). Print the card up on standard 3″ x 5″ index card stock.

When approached by the police, read your Salinas Card to them, date it and ask for their signature on it. (Keep the original, give them a copy.)

Personally I would keep four (4) or (5) 2-card sets on hand at all times.

PS: This is not legal advice but a suggestion that you get legal advice. Show this post to your local public defender and ask them to approve a Salinas card.

December 17, 2013

Cross-categorization of legal concepts…

Filed under: Artificial Intelligence,Law,Legal Informatics,Ontology — Patrick Durusau @ 3:21 pm

Cross-categorization of legal concepts across boundaries of legal systems: in consideration of inferential links by Fumiko Kano Glückstad, Tue Herlau, Mikkel N. Schmidt, Morten Mørup.

Abstract:

This work contrasts Giovanni Sartor’s view of inferential semantics of legal concepts (Sartor in Artif Intell Law 17:217–251, 2009) with a probabilistic model of theory formation (Kemp et al. in Cognition 114:165–196, 2010). The work further explores possibilities of implementing Kemp’s probabilistic model of theory formation in the context of mapping legal concepts between two individual legal systems. For implementing the legal concept mapping, we propose a cross-categorization approach that combines three mathematical models: the Bayesian Model of Generalization (BMG; Tenenbaum and Griffiths in Behav Brain Sci 4:629–640, 2001), the probabilistic model of theory formation, i.e., the Infinite Relational Model (IRM) first introduced by Kemp et al. (The twenty-first national conference on artificial intelligence, 2006, Cognition 114:165–196, 2010) and its extended model, i.e., the normal-IRM (n-IRM) proposed by Herlau et al. (IEEE International Workshop on Machine Learning for Signal Processing, 2012). We apply our cross-categorization approach to datasets where legal concepts related to educational systems are respectively defined by the Japanese- and the Danish authorities according to the International Standard Classification of Education. The main contribution of this work is the proposal of a conceptual framework of the cross-categorization approach that, inspired by Sartor (Artif Intell Law 17:217–251, 2009), attempts to explain reasoner’s inferential mechanisms.

From the introduction:

An ontology is traditionally considered as a means for standardizing knowledge represented by different parties involved in communications (Gruber 1992; Masolo et al. 2003; Declerck et al. 2010). Kemp et al. (2010) also points out that some scholars (Block 1986; Field 1977; Quilian 1968) have argued the importance of knowledge structuring, i.e., ontologies, where concepts are organized into systems of relations and the meaning of a concept partly depends on its relationships to other concepts. However, real human to human communication cannot be absolutely characterized by such standardized representations of knowledge. In Kemp et al. (2010), two challenging issues are raised against such idea of systems of concepts. First, as Fodor and Lepore (1992) originally pointed out, it is beyond comprehension that the meaning of any concept can be defined within a standardized single conceptual system. It is unrealistic that two individuals with different beliefs have common concepts. This issue has also been discussed in semiotics (Peirce 2010; Durst-Andersen 2011) and in cognitive pragmatics (Sperber and Wilson 1986). For example, Sperber and Wilson (1986) discuss how mental representations are constructed diversely under different environmental and cognitive conditions. A second point which Kemp et al. (2010) specifically address in their framework is the concept acquisition problem. According to Kemp et al. (2010; see also: Hempel (1985), Woodfield (1987)):

if the meaning of each concept depends on its role within a system of concepts, it is difficult to see how a learner might break into the system and acquire the concepts that it contains. (Kemp et al. 2010)

Interestingly, the similar issue is also discussed by legal information scientists. Sartor (2009) argues that:

legal concepts are typically encountered in the context of legal norms, and the issue of determining their content cannot be separated from the issue of identifying and interpreting the norms in which they occur, and of using such norms in legal inference. (Sartor 2009)

This argument implies that if two individuals who are respectively belonging to two different societies having different legal systems, they might interpret a legal term differently, since the norms in which the two individuals belong are not identical. The argument also implies that these two individuals must have difficulties in learning a concept contained in the other party’s legal system without interpreting the norms in which the concept occurs.

These arguments motivate us to contrast the theoretical work presented by Sartor (2009) with the probabilistic model of theory formation by Kemp et al. (2010) in the context of mapping legal concepts between two individual legal systems. Although Sartor’s view addresses the inferential mechanisms within a single legal system, we argue that his view is applicable in a situation where a concept learner (reasoner) is, based on the norms belonging to his or her own legal system, going to interpret and adapt a new concept introduced from another legal system. In Sartor (2009), the meaning of a legal term results from the set of inferential links. The inferential links are defined based on the theory of Ross (1957) as:

  1. the links stating what conditions determine the qualification Q (Q-conditioning links), and
  2. the links connecting further properties to possession of the qualification Q (Q-conditioned links.) (Sartor 2009)

These definitions can be seen as causes and effects in Kemp et al. (2010). If a reasoner is learning a new legal concept in his or her own legal system, the reasoner is supposed to seek causes and effects identified in the new concept that are common to the concepts which the reasoner already knows. This way, common-causes and common-effects existing within a concept system, i.e., underlying relationships among domain concepts, are identified by a reasoner. The probabilistic model in Kemp et al. (2010) is supposed to learn these underlying relationships among domain concepts and identify a system of legal concepts from a view where a reasoner acquires new concepts in contrast to the concepts already known by the reasoner.

Pardon the long quote but the paper is pay-per-view.

I haven’t started to run down all the references but this is an interesting piece of work.

I was most impressed by the partial echoing of the topic map paradigm that: “meaning of each concept depends on its role within a system of concepts….

True, a topic map can capture only “surface” facts and relationships between those facts but that merits a comment on a topic map instance and not topic maps in general.

Noting that you also shouldn’t pay for more topic map than you need. If all you need is a flat mapping between DHS and say the CIA, then doing nor more than mapping terms is sufficient. If you need a maintainable and robust mapping, different techniques would be called for. Both results would be topic maps, but one of them would be far more useful.

One of the principal sources relied upon by the authors’ is: The Nature of Legal Concepts: Inferential Nodes or Ontological Categories? by Giovanni Sartor.

I don’t see any difficulty with Sartor’s rules of inference, any more than saying if a topic has X property (occurrence in TMDM speak), then of necessity it must have property E, F, and G.

Where I would urge caution is with the notion that properties of a legal concept spring from a legal text alone. Or even from a legal ontology. In part because two people in the same legal system can read the same legal text and/or use the same legal ontology and expect to see different properties for a legal concept.

Consider the text of Paradise Lost by John Milton. If Stanley Fish, a noted Milton scholar, were to assign properties to the concepts in Book 1, his list of properties would be quite different from my list of properties. Same words, same text, but very different property lists.

To refine what I said about the topic map paradigm a bit earlier, it should read: “meaning of each concept depends on its role within a system of concepts [and the view of its hearer/reader]….

The hearer/reader being the paramount consideration. Without a hearer/reader, there is no concept or system of concepts or properties of either one for comparison.

When topics are merged, there is a collecting of properties, some of which you may recognize and some of which I may recognize, as identifying some concept or subject.

No guarantees but better than repeating your term for a concept over and over again, each time in a louder voice. 😉

eDiscovery

Filed under: e-Discovery,Law,Law - Sources — Patrick Durusau @ 12:25 pm

2013 End-of Year List of People Who Make a Difference in eDiscovery by Gerard. J. Britton.

Gerald has created a list of six (6) people who made a difference in ediscovery in 2013.

If ediscovery is unfamiliar, you have all of the issues of data/big data with an additional layer of legal rules and requirements.

Typically seen in litigation with high stakes.

A fruitful area for the application of semantic integration technologies, topic maps in particular.

December 14, 2013

Everything is Editorial:..

Filed under: Algorithms,Law,Legal Informatics,Search Algorithms,Searching,Semantics — Patrick Durusau @ 7:57 pm

Everything is Editorial: Why Algorithms are Hand-Made, Human, and Not Just For Search Anymore by Aaron Kirschenfeld.

From the post:

Down here in Durham, NC, we have artisanal everything: bread, cheese, pizza, peanut butter, and of course coffee, coffee, and more coffee. It’s great—fantastic food and coffee, that is, and there is no doubt some psychological kick from knowing that it’s been made carefully by skilled craftspeople for my enjoyment. The old ways are better, at least until they’re co-opted by major multinational corporations.

Aside from making you either hungry or jealous, or perhaps both, why am I talking about fancy foodstuffs on a blog about legal information? It’s because I’d like to argue that algorithms are not computerized, unknowable, mysterious things—they are produced by people, often painstakingly, with a great deal of care. Food metaphors abound, helpfully I think. Algorithms are the “special sauce” of many online research services. They are sets of instructions to be followed and completed, leading to a final product, just like a recipe. Above all, they are the stuff of life for the research systems of the near future.

Human Mediation Never Went Away

When we talk about algorithms in the research community, we are generally talking about search or information retrieval (IR) algorithms. A recent and fascinating VoxPopuLII post by Qiang Lu and Jack Conrad, “Next Generation Legal Search – It’s Already Here,” discusses how these algorithms have become more complicated by considering factors beyond document-based, topical relevance. But I’d like to step back for a moment and head into the past for a bit to talk about the beginnings of search, and the framework that we have viewed it within for the past half-century.

Many early information-retrieval systems worked like this: a researcher would come to you, the information professional, with an information need, that vague and negotiable idea which you would try to reduce to a single question or set of questions. With your understanding of Boolean search techniques and your knowledge of how the document corpus you were searching was indexed, you would then craft a search for the computer to run. Several hours later, when the search was finished, you would be presented with a list of results, sometimes ranked in order of relevance and limited in size because of a lack of computing power. Presumably you would then share these results with the researcher, or perhaps just turn over the relevant documents and send him on his way. In the academic literature, this was called “delegated search,” and it formed the background for the most influential information retrieval studies and research projects for many years—the Cranfield Experiments. See also “On the History of Evaluation in IR” by Stephen Robertson (2008).

In this system, literally everything—the document corpus, the index, the query, and the results—were mediated. There was a medium, a middle-man. The dream was to some day dis-intermediate, which does not mean to exhume the body of the dead news industry. (I feel entitled to this terrible joke as a former journalist… please forgive me.) When the World Wide Web and its ever-expanding document corpus came on the scene, many thought that search engines—huge algorithms, basically—would remove any barrier between the searcher and the information she sought. This is “end-user” search, and as algorithms improved, so too would the system, without requiring the searcher to possess any special skills. The searcher would plug a query, any query, into the search box, and the algorithm would present a ranked list of results, high on both recall and precision. Now, the lack of human attention, evidenced by the fact that few people ever look below result 3 on the list, became the limiting factor, instead of the lack of computing power.

delegated search

The only problem with this is that search engines did not remove the middle-man—they became the middle-man. Why? Because everything, whether we like it or not, is editorial, especially in reference or information retrieval. Everything, every decision, every step in the algorithm, everything everywhere, involves choice. Search engines, then, are never neutral. They embody the priorities of the people who created them and, as search logs are analyzed and incorporated, of the people who use them. It is in these senses that algorithms are inherently human.

A delightful piece on search algorithms that touches at the heart of successful search and/or data integration.

Its first three words capture the issue: Everything is Editorial….

Despite the pretensions of scholars, sages and rogues, everything is editorial, there are no universal semantic primitives.

For convenience in data processing we may choose to treat some tokens as semantic primitives, but that is always a choice that we make.

Once you make that leap, it comes as no surprise that owl:sameAs wasn’t used the same way by everyone who used it.

See: When owl:sameAs isn’t the Same: An Analysis of Identity Links on the Semantic Web by Harry Halpin, Ivan Herman, and Patrick J. Hayes, for one take on the confusion around owl:sameAs.

If you are interested in moving beyond opaque keyword searching, consider Aaron’s post carefully.

December 3, 2013

Scout [NLP, Move up from Twitter Feeds to Court Opinions]

Filed under: Government,Government Data,Law,Law - Sources — Patrick Durusau @ 5:01 pm

Scout

From the about page:

Scout is a free service that provides daily insight to how our laws and regulations are shaped in Washington, DC and our state capitols.

These days, you can receive electronic alerts to know when a company is in the news, when a TV show is scheduled to air or when a sports team wins. Now, you can also be alerted when our elected officials take action on an issue you care about.

Scout allows anyone to subscribe to customized email or text alerts on what Congress is doing around an issue or a specific bill, as well as bills in the state legislature and federal regulations. You can also add external RSS feeds to complement a Scout subscription, such as press releases from a member of Congress or an issue-based blog.

Anyone can create a collection of Scout alerts around a topic, for personal organization or to make it easy for others to easily follow a whole topic at once.

Researchers can use Scout to see when Congress talks about an issue over time. Members of the media can use Scout to track when legislation important to their beat moves ahead in Congress or in state houses. Non-profits can use Scout as a tool to keep tabs on how federal and state lawmakers are making policy around a specific issue.

Early testing of Scout during its open beta phase alerted Sunlight and allies in time to successfully stop an overly broad exemption to the Freedom of Information Act from being applied to legislation that was moving quickly in Congress. Read more about that here.

Thank you to the Stanton Foundation, who contributed generous support to Scout’s development.

What kind of alerts?

If your manager suggests a Twitter feed to test NLP, classification, sentiment, etc. code, ask to use Federal Court (U.S.) Court Opinion Feed instead.

Not all data is written in one hundred and forty (140) character chunks. 😉

PS: Be sure to support/promote the Sunlight Foundation for making this data available.

October 29, 2013

the /unitedstates project

Filed under: Government,Government Data,Law — Patrick Durusau @ 6:59 pm

the /unitedstates project

From the webpage:

/unitedstates is a shared commons of data and tools for the United States. Made by the public, used by the public.

There you will find:

bill-nicknames Tiny spreadsheet of common nicknames for bills and laws.

citation Stand-alone legal citation detector. Text in, citations out.

congress-legislators Detailed data on members of Congress, past and present.

congress Scrapers and parsers for the work of Congress, all day, every day.

glossary A public domain glossary for the United States.

licensing Policy guidelines for the licensing of US government information.

uscode Parser for the US Code.

wish-list Post ideas for new projects.

Can you guess what the #1 wish on the project list is?

Campaign finance donor de-duplicator

October 21, 2013

Semantics and Delivery of Useful Information [Bills Before the U.S. House]

Filed under: Government,Government Data,Law,Semantics — Patrick Durusau @ 2:23 pm

Lars Marius Garshol pointed out in Semantic Web adoption and the users the question of “What do semantic technologies do better than non-semantic technologies?” has yet to be answered.

Tim O’Reilly tweeted about Madison Federal today, a resource that raises the semantic versus non-semantic technology question.

In a nutshell, Madison Federal has all the bills pending before the U.S. House of Representatives online.

If you login with Facebook, you can:

  • Add a bill edit / comment
  • Enter a community suggestion
  • Enter a community comment
  • Subscribe to future edits/comments on a bill

So far, so good.

You can pick any bill but the one I chose as an example is: Postal Executive Accountability Act.

I will quote just a few lines of the bill:

2. Limits on executive pay

    (a) Limitation on compensation Section 1003 of title 39, United States Code, 
         is amended:

         (1) in subsection (a), by striking the last sentence; and
         (2) by adding at the end the following:

             (e)
                  (1) Subject to paragraph (2), an officer or employee of the Postal 
                      Service may not be paid at a rate of basic pay that exceeds 
                      the rate of basic pay for level II of the Executive Schedule 
                      under section 5312 of title 5.

What would be the first thing you want to know?

Hmmm, what about subsection (a) of title 39 of the United States Code since we are striking the last sentence?

39 USC § 1003 – Employment policy [Legal Information Institute], which reads:

(a) Except as provided under chapters 2 and 12 of this title, section 8G of the Inspector General Act of 1978, or other provision of law, the Postal Service shall classify and fix the compensation and benefits of all officers and employees in the Postal Service. It shall be the policy of the Postal Service to maintain compensation and benefits for all officers and employees on a standard of comparability to the compensation and benefits paid for comparable levels of work in the private sector of the economy. No officer or employee shall be paid compensation at a rate in excess of the rate for level I of the Executive Schedule under section 5312 of title 5.

OK, so now we know that (1) is striking:

No officer or employee shall be paid compensation at a rate in excess of the rate for level I of the Executive Schedule under section 5312 of title 5.

Semantics? No, just a hyperlink.

For the added text, we want to know what is meant by:

… rate of basic pay that exceeds the rate of basic pay for level II of the Executive Schedule under section 5312 of title 5.

The Legal Information Institute is already ahead of Congress because their system provides the hyperlink we need: 5312 of title 5.

If you notice something amiss when you follow that link, congratulations! You have discovered your first congressional typo and/or error.

5312 of title 5 defines Schedule I of the Executive Schedule, which includes the Secretary of State, Secretary of the Treasury, Secretary of Defense, Attorney General and others. Base rate for Executive Schedule Level I is $199,700.

On the other hand, 5313 of title 5 defines Schedule II of the Executive Schedule, which includes Department of Agriculture, Deputy Secretary of Agriculture; Department of Defense, Deputy Secretary of Defense, Secretary of the Army, Secretary of the Navy, Secretary of the Air Force, Under Secretary of Defense for Acquisition, Technology and Logistics; Department of Education, Deputy Secretary of Education; Department of Energy, Deputy Secretary of Energy and others. Base rate for Executive Schedule Level II is $178,700.

Assuming someone catches or comments that 5312 should be 5313, top earners at the Postal Service may be about to take a $21,000.00 pay reduction.

We got all that from mechanical hyperlinks, no semantic technology required.

Where you might need semantic technology is when reading 39 USC § 1003 – Employment policy [Legal Information Institute] where it says (in part):

…It shall be the policy of the Postal Service to maintain compensation and benefits for all officers and employees on a standard of comparability to the compensation and benefits paid for comparable levels of work in the private sector of the economy….

Some questions:

Question: What are “comparable levels of work in the private sector of the economy?”

Question: On what basis is work for the Postal Service compared to work in the private economy?

Question: Examples of comparable jobs in the private economy and their compensation?

Question: What policy or guideline documents have been developed by the Postal Service for evaluation of Postal Service vs. work in the private economy?

Question: What studies have been done, by who, using what practices, on comparing compensation for Postal Service work to work in the private economy?

That would be a considerable amount of information with what I suspect would be a large amount of duplication as reports or studies are cited by numerous sources.

Semantic technology would be necessary for the purpose of deduping and navigating such a body of information effectively.

Pick a bill. Where would you put the divide between mechanical hyperlinks and semantic technologies?

PS: You may remember that the House of Representatives had their own “post office” which they ran as a slush fund. The thought of the House holding someone “accountable” is too bizarre for words.

October 18, 2013

A Case Study on Legal Case Annotation

Filed under: Annotation,Law,Law - Sources — Patrick Durusau @ 3:48 pm

A Case Study on Legal Case Annotation by Adam Wyner, Wim Peters, and Daniel Katz.

Abstract:

The paper reports the outcomes of a study with law school students to annotate a corpus of legal cases for a variety of annotation types, e.g. citation indices, legal facts, rationale, judgement, cause of action, and others. An online tool is used by a group of annotators that results in an annotated corpus. Differences amongst the annotations are curated, producing a gold standard corpus of annotated texts. The annotations can be extracted with semantic searches of complex queries. There would be many such uses for the development and analysis of such a corpus for both legal education and legal research.

Bibtex
@INPROCEEDINGS{WynerPetersKatzJURIX2013,
author = {Adam Wyner and Peters, Wim, and Daniel Katz},
title = {A Case Study on Legal Case Annotation},
booktitle = {Proceedings of 26th International Conference on Legal Knowledge and Information Systems (JURIX 2013)},
year = {2013},
pages = {??-??},
address = {Amsterdam},
publisher = {IOS Press}
}

The methodology and results of this study will be released as open source resources.

A gold standard for annotation of legal texts will create the potential for automated tools to assist lawyers, judges and possibly even lay people.

Deeply interested to see where this project goes next.

October 9, 2013

Explore the world’s constitutions with a new online tool

Filed under: Law,Searching — Patrick Durusau @ 7:52 pm

Explore the world’s constitutions with a new online tool

From the post:

Constitutions are as unique as the people they govern, and have been around in one form or another for millennia. But did you know that every year approximately five new constitutions are written, and 20-30 are amended or revised? Or that Africa has the youngest set of constitutions, with 19 out of the 39 constitutions written globally since 2000 from the region?

With this in mind, Google Ideas supported the Comparative Constitutions Project to build Constitute, a new site that digitizes and makes searchable the world’s constitutions. Constitute enables people to browse and search constitutions via curated and tagged topics, as well as by country and year. The Comparative Constitutions Project cataloged and tagged nearly 350 themes, so people can easily find and compare specific constitutional material. This ranges from the fairly general, such as “Citizenship” and “Foreign Policy,” to the very specific, such as “Suffrage and turnouts” and “Judicial Autonomy and Power.”

I applaud the effort but wonder about the easily find and compare specific constitutional material?

Legal systems are highly contextual.

See the Constitution Annotated (U.S.) if you want see interpretations of words that would not occur to you. Promise.

October 2, 2013

No Free Speech for Tech Firms?

Filed under: Law,NSA,Privacy,Security — Patrick Durusau @ 4:16 pm

I stumbled across Tech firms’ release of PRISM data will harm security — new U.S. and FBI court filings by Jeff John Roberts today.

From the post:

The Department of Justice, in long-awaited court filings that have just been released, urged America’s secret spy court to reject a plea by five major tech companies to disclose data about how often the government asks for user information under a controversial surveillance program aimed at foreign suspects.

The filings, which appeared on Wednesday, claimed that the tech companies – Google, Microsoft, Facebook, LinkedIn and Yahoo — do not have a First Amendment right to disclose how many Foreign Intelligence Surveillance Act requests they receive.

“Adversaries may alter their behavior by switching to service that the Government is not intercepting,” said the filings, which are heavily blacked out and cite Edward Snowden, a former NSA contractor. Snowden has caused an ongoing stir by leaking documents about a U.S. government program known as PRISM that vacuums up meta-data from the technology firms.

I thought we had settled the First Amendment for corporations back in Citizens United v. FEC.

Justice Kennedy, writing for the majority said:

The censorship we now confront is vast in its reach. The Government has “muffle[d] the voices that best represent the most significant segments of the economy.” Mc Connell, supra, at 257–258 (opinion of Scalia, J.). And “the electorate [has been] deprived of information, knowledge and opinion vital to its function.” CIO, 335 U. S., at 144 (Rutledge, J., concurring in result). By suppressing the speech of manifold corporations, both for-profit and nonprofit, the Government prevents their voices and viewpoints from reaching the public and advising voters on which persons or entities are hostile to their interests. Factions will necessarily form in our Republic, but the remedy of “destroying the liberty” of some factions is “worse than the disease.” The Federalist No. 10, p. 130 (B. Wright ed. 1961) (J. Madison). Factions should be checked by permitting them all to speak, see ibid., and by entrusting the people to judge what is true and what is false.

The purpose and effect of this law is to prevent corporations, including small and nonprofit corporations, from presenting both facts and opinions to the public. This makes Austin ’s antidistortion rationale all the more an aberration. “[T]he
First Amendment protects the right of corporations to petition legislative and administrative bodies.” Bellotti, 435 U. S., at 792, n. 31 (citing California Motor Transport Co. v. Trucking Unlimited, 404 U. S. 508, 510–511 (1972); Eastern Railroad Presidents Conference v. Noerr Motor Freight, Inc., 365 U. S. 127, 137–138 (1961)). Corporate executives and employees counsel Members of Congress and Presidential administrations on many issues, as a matter of routine and often in private. An amici brief filed on behalf of Montana and 25 other States notes that lobbying and corporate communications with elected officials occur on a regular basis. Brief for State of Montana et al. as Amici Curiae 19. When that phenomenon is coupled with §441b, the result is that smaller or nonprofit corporations cannot raise a voice to object when other corporations, including those with vast wealth, are cooperating with the Government. That cooperation may sometimes be voluntary, or it may be at the demand of a Government official who uses his or her authority, influence, and power to threaten corporations to support the Government’s policies. Those kinds of interactions are often unknown and unseen. The speech that §441b forbids, though, is public, and all can judge its content and purpose. References to massive corporate treasuries should not mask the real operation of this law. Rhetoric ought not obscure reality.

I admit that Citizens United v. FEC was about corporations buying elections but Justice Department censorship in this case is even worse.

Censorship in this case strikes at trust in products and services from: Google, Microsoft, Facebook, LinkedIn, Yahoo and Dropbox.

And it prevents consumers from making their own choices about who or what to trust.

Google, Microsoft, Facebook, LinkedIn, Yahoo and Dropbox should publish all the details of FISA requests.

Trust your customers/citizens to make the right choice.

PS: If Fortune 50 companies don’t have free speech, what do you think you have?

September 24, 2013

…Link and Reference Rot in Legal Citations

Filed under: Citation Analysis,Citation Practices,Law,Law - Sources,Legal Informatics — Patrick Durusau @ 1:58 pm

Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations by Jonathan Zittrain, Kendra Albert, Lawrence Lessig.

Abstract:

We document a serious problem of reference rot: more than 70% of the URLs within the Harvard Law Review and other journals, and 50% of the URLs found within U.S. Supreme Court opinions do not link to the originally cited information.

Given that, we propose a solution for authors and editors of new scholarship that involves libraries undertaking the distributed, long-term preservation of link contents.

Imagine trying to use a phone book where 70% of the addresses were wrong.

Or you are looking for your property deed and learn that only 50% of the references are correct.

Do those sound like acceptable situations?

Considering the Harvard Law Review and the U.S. Supreme Court put a good deal of effort into correct citations, the fate of the rest of the web must be far worse.

The about page for Perma reports:

Any author can go to the Perma.cc website and input a URL. Perma.cc downloads the material at that URL and gives back a new URL (a “Perma.cc link”) that can then be inserted in a paper.

After the paper has been submitted to a journal, the journal staff checks that the provided Perma.cc link actually represents the cited material. If it does, the staff “vests” the link and it is forever preserved. Links that are not “vested” will be preserved for two years, at which point the author will have the option to renew the link for another two years.

Readers who encounter Perma.cc links can click on them like ordinary URLs. This takes them to the Perma.cc site where they are presented with a page that has links both to the original web source (along with some information, including the date of the Perma.cc link’s creation) and to the archived version stored by Perma.cc.

I would caution that “forever” is a very long time.

What happens to the binding between an identifier and a URL when URLs are replaced by another network protocol?

After all the change over the history of the Internet, you don’t believe the current protocols will last “forever” Yes?

A more robust solution would divorce identifiers/citations from any particular network protocol, whether you think it will last forever or not.

That separation of identifier from network protocol preserves the possibility of an online database such as Perma.cc but also databases that have local caches of the citations and associated content, databases that point to multiple locations for associated content, and databases that support currently unknown protocols to access content associated with an identifier.

Just as a database of citations from Codex Justinianus could point to the latest printed Latin text, online versions or future versions.

Citations can become permanent identifiers if they don’t rely on a particular network addressing systems.

Court Listener

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 9:29 am

Court Listener

From the about page:

Started as a part-time hobby in 2010, CourtListener is now a core project of the Free Law Project, a California Non-Profit corporation. The goal of the site is to provide powerful free legal tools for everybody while giving away all our data in bulk downloads.

We collect legal opinions from court websites and from data donations, and are aiming to have the best, most complete data on the open Web within the next couple years. We are slowly expanding to provide search and awareness tools for as many state courts as possible, and we already have tools for all of the Federal Appeals Courts. For more details on which jurisdictions we support, see our coverage page. If you’re able to help us acquire more cases, please get in touch.

This rather remarkable site has collected 905,842 court opinions as of September 24, 2013.

The default listing of cases is newest first but you can choose oldest first, most/least cited first and keyword relevance. Changing the listing order becomes interesting once you perform a keyword search (top search bar). The refinement (left hand side) works quite well, except that I could not filter search results by a judges name. On case names, separate the parties with “v.” as “vs” doesn’t work.

It is also possible to discover examples of changing legal terminology that impact your search results.

For example, try searching for the keyword phrase, “interstate commerce.” Now choose “Oldest first.” you will see Price v. Ralston (1790) and the next case is Crandall v. State of Nevada (1868). Hmmm, what happened to the early interstate commerce cases under John Marshall?

OK, so try “commerce.” Now set to “Oldest first.” Hmmm, a lot more cases. Yes? Under case name, type in “Gibbons” and press return. Now the top case is Gibbons v. Ogden (1824). The case name is a hyperlink so follow that now.

It is a long opinion by Chief Justice Marshall but at paragraph 5 he announces:

The power to regulate commerce extends to every species of commercial intercourse between the United States and foreign nations, and among the several States. It does not stop at the external boundary of a State.

The phrase “among the several States,” occurs 21 times in Gibbons v. Ogden, with no mention of the modern “interstate commerce.”

What we now call the “interstate commerce clause” played a major role in the New Deal legislation that ended the 1930’s depression in the United States. See Commerce Clause. Following the cases cited under “New Deal” will give you an interesting view of the conflicting sides. A conflict that still rages today.

The terminology problem, “among the several states” vs. “interstate commerce” is one that makes me doubt the efficacy of public access to law programs. Short of knowing the “right” search words, it is unlikely you would have found Gibbons v. Ogden. Well, short of reading through the entire corpus of Supreme Court decisions. 😉

Public access to law would be enhanced with mappings such as “interstate commerce,” and “among the several states,” but also distinguishing “due process,” didn’t always mean what it means today, and further mappings to colloquial search expressions.

A topic map could capture those nuances and many more.

I guess the question is whether people should be free to search for the law or should they be freed by finding the law?

September 23, 2013

Broadening Google Patents [Patent Troll Indigestion]

Filed under: Law,Patents,Searching — Patrick Durusau @ 12:42 pm

Broadening Google Patents by Jon Orwant.

From the post:

Last year, we launched two improvements to Google Patents: the Prior Art Finder and European Patent Office (EPO) patents. Today we’re happy to announce the addition of documents from four new patent agencies: China, Germany, Canada, and the World Intellectual Property Organization (WIPO). Many of these documents may provide prior art for future patent applications, and we hope their increased discoverability will improve the quality of patents in the U.S. and worldwide.

The broadening of Google Patents is welcome news!

Especially following the broadening of “prior art” under the America Invents Act (AIA).

On the expansion of prior art, such as publication before date of filing the patent (old rule was before the date of invention), a good summary can be found at: The Changing Boundaries of Prior Art under the AIA: What Your Company Needs to Know.

The information you find needs to remain found, intertwined with other information you find.

Regular search engines won’t help you there. May I suggest topic maps?

August 21, 2013

Measuring the Complexity of the Law: The United States Code

Filed under: Complexity,Law — Patrick Durusau @ 3:12 pm

Measuring the Complexity of the Law: The United States Code by Daniel Martin Katz and Michael James Bommarito II.

Abstract:

Einstein’s razor, a corollary of Ockham’s razor, is often paraphrased as follows: make everything as simple as possible, but not simpler. This rule of thumb describes the challenge that designers of a legal system face — to craft simple laws that produce desired ends, but not to pursue simplicity so far as to undermine those ends. Complexity, simplicity’s inverse, taxes cognition and increases the likelihood of suboptimal decisions. In addition, unnecessary legal complexity can drive a misallocation of human capital toward comprehending and complying with legal rules and away from other productive ends.

While many scholars have offered descriptive accounts or theoretical models of legal complexity, empirical research to date has been limited to simple measures of size, such as the number of pages in a bill. No extant research rigorously applies a meaningful model to real data. As a consequence, we have no reliable means to determine whether a new bill, regulation, order, or precedent substantially effects legal complexity.

In this paper, we address this need by developing a proposed empirical framework for measuring relative legal complexity. This framework is based on “knowledge acquisition,” an approach at the intersection of psychology and computer science, which can take into account the structure, language, and interdependence of law. We then demonstrate the descriptive value of this framework by applying it to the U.S. Code’s Titles, scoring and ranking them by their relative complexity. Our framework is flexible, intuitive, and transparent, and we offer this approach as a first step in developing a practical methodology for assessing legal complexity.

Curious what you make of the treatment of the complexity of language of laws in this article?

The authors compute the number of words and the average length of words in each title of the United States Code. In addition, the Shannon entropy of each title is also calculated. Those results figure in the author’s determination of the complexity of each title.

To be sure, those are all measurable aspects of each title and so in that sense the results and the process to reach them can be duplicated and verified by others.

The author’s are using a “knowledge acquisition model,” that is measuring the difficulty a reader would experience in reading and acquiring knowledge about any part of the United States Code.

But reading the bare words of the U.S. Code is not a reliable way to acquire legal knowledge. Words in the U.S. Code and their meanings have been debated and decided (sometimes differently) by various courts. A reader doesn’t understand the U.S. Code without knowledge of court decisions on the language of the text.

Let me give you a short example:

42 U.S.C. §1983 read:

Every person who, under color of any statute, ordinance, regulation, custom, or usage, of any State or Territory or the District of Columbia, subjects, or causes to be subjected, any citizen of the United States or other person within the jurisdiction thereof to the deprivation of any rights, privileges, or immunities secured by the Constitution and laws, shall be liable to the party injured in an action at law, suit in equity, or other proper proceeding for redress, except that in any action brought against a judicial officer for an act or omission taken in such officer’s judicial capacity, injunctive relief shall not be granted unless a declaratory decree was violated or declaratory relief was unavailable. For the purposes of this section, any Act of Congress applicable exclusively to the District of Columbia shall be considered to be a statute of the District of Columbia. (emphasis added)

Before reading the rest of this post, answer this question: Is a municipality a person for purposes of 42 U.S.C. §1983?

That is if city employees violate your civil rights, can you sue them and the city they work for?

That seems like a straightforward question. Yes?

In Monroe v. Pape, 365 US 167 (1961), the Supreme Court found the answer was no. Municipalities were not “persons” for purposes of 42 U.S.C. §1983.

But a reader who only remembers that decision would be wrong if trying to understand that statute today.

In Monell v. New York City Dept. of Social Services, 436 U.S. 658 (1978), the Supreme Court found that it was mistaken in Monroe v. Pape and found the answer was yes. Municipalities could be “persons” for purposes of 42 U.S.C. §1983, in some cases.

The language in 42 U.S.C. §1983 did not change between 1961 and 1978. Nor did the circumstances under which section 1983 was passed (Civil War reconstruction) change.

But the meaning of that one word changed significantly.

Many other words in the U.S. Code have had a similar experience.

If you need assistance with 42 U.S.C. §1983 or any other part of the U.S. Code or other laws, seek legal counsel.

August 16, 2013

Finding Parties Named in U.S. Law…

Filed under: Law,Natural Language Processing,NLTK,Python — Patrick Durusau @ 4:59 pm

Finding Parties Named in U.S. Law using Python and NLTK by Gary Sieling.

From the post:

U.S. Law periodically names specific institutions; historically it is possible for Congress to write a law naming an individual, although I think that has become less common. I expect the most common entities named in Federal Law to be groups like Congress. It turns out this is true, but the other most common entities are the law itself and bureaucratic functions like archivists.

To get at this information, we need to read the Code XML, and use a natural language processing library to get at the named groups.

NLTK is such an NLP library. It provides interesting features like sentence parsing, part of speech tagging, and named entity recognition. (If interested in the subject see my review of “Natural Language Processing with Python“, a book which covers this library in detail)

I would rather know who paid for particular laws but that requires information external to the Code XML data set. 😉

A very good exercise to become familiar with both NLTK and the Code XML data set.

August 9, 2013

Counting Citations in U.S. Law

Filed under: Graphics,Law,Law - Sources,Visualization — Patrick Durusau @ 3:17 pm

Counting Citations in U.S. Law by Gary Sieling.

From the post:

The U.S. Congress recently released a series of XML documents containing U.S. Laws. The structure of these documents allow us to find which sections of the law are most commonly cited. Examining which citations occur most frequently allows us to see what Congress has spent the most time thinking about.

Citations occur for many reasons: a justification for addition or omission in subsequent laws, clarifications, or amendments, or repeals. As we might expect, the most commonly cited sections involve the IRS (Income Taxes, specifically), Social Security, and Military Procurement.

To arrive at this result, we must first see how U.S. Code is laid out. The laws are divided into a hierarchy of units, which allows anything from an entire title to individual sentences to cited. These sections have an ID and an identifier – “identifier” is used an an citation reference within the XML documents, and has a different form from the citations used by the legal community, comes in a form like “25 USC Chapter 21 § 1901″.

If you are interested in some moderate XML data processing, this is the project for you!

Gary has posted the code for developing a citation index to the U.S. Laws in XML.

If you want to skip to one great result of this effort, see: Visualizing Citations in U.S. Law, also by Gary, which is based on d3.js and Uber Data visualization.

In the “Visualizing” post Gary enables the reader to see what laws (by title) cite other titles in U.S. law.

More interesting that you would think.

Take Title 26, Internal Revenue Code (IRC).

Among others, the IRC does not cite:

Title 30 – MINERAL LANDS AND MINING
Title 31 – MONEY AND FINANCE
Title 32 – NATIONAL GUARD

I can understand not citing the NATIONAL GUARD but MONEY AND FINANCE?

Looking forward to more ways to explore the U.S. Laws.

Tying legislative history of laws to say New York Times stories on the subject matter of a law could prove to be very interesting.

I started to suggest tracking donations to particular sponsors and then to legislation that benefits the donors.

But that level of detail is just a distraction. Most elected officials have no shame at selling their offices. Documenting their behavior may regularize pricing of senators and representatives but not have much other impact.

I suggest you find a button other than truth to influence their actions.

July 31, 2013

U.S. Code Available in Bulk XML

Filed under: Government,Law,Law - Sources — Patrick Durusau @ 4:24 pm

House of Representatives Makes U.S. Code Available in Bulk XML.

From the press release:

As part of an ongoing effort to make Congress more open and transparent, House Speaker John Boehner (R-OH) and Majority Leader Eric Cantor (R-VA) today announced that the House of Representatives is making the United States Code available for download in XML format.

The data is compiled, updated, and published by the Office of Law Revision Counsel (OLRC). You can download individual titles – or the full code in bulk – and read supporting documentation here.

“Providing free and open access to the U.S. Code in XML is another win for open government,” said Speaker Boehner and Leader Cantor. “And we want to thank the Office of Law Revision Counsel for all of their work to make this project a reality. Whether it’s our ‘read the bill’ reforms, streaming debates and committee hearings live online, or providing unprecedented access to legislative data, we’re keeping our pledge to make Congress more transparent and accountable to the people we serve.”

In 2011, Speaker Boehner and Leader Cantor called for the adoption of new electronic data standards to make legislative information more open and accessible. With those standards in place, the House created the Legislative Branch Bulk Data Task Force in 2012 to expedite the process of providing bulk access to legislative information and to increase transparency for the American people.

Since then, the Government Printing Office (GPO) has begun providing bulk access to House legislation in XML. The Office of the Clerk makes full sessions of House floor summaries available in bulk as well.

The XML version of the U.S. Code will be updated quickly, on an ongoing basis, as new laws are enacted.

You can see a full list of open government projects underway in the House at speaker.gov/open.

While applauding Congress, don’t forget Legal Information Institute at Cornell University Law School has been working on free access to public law for the past twenty-one (21) years.

I first saw this at: U.S. House of Representatives Makes U.S. Code Available in Bulk XML.

July 13, 2013

ggplot2 Choropleth of Supreme Court Decisions: A Tutorial

Filed under: Ggplot2,Law,Law - Sources — Patrick Durusau @ 1:34 pm

ggplot2 Choropleth of Supreme Court Decisions: A Tutorial

From the post:

I don't do much GIS but I like to. It's rather enjoyable and involves a tremendous skill set. Often you will find your self grabbing data sets from some site, scraping, data cleaning and reshaping, and graphing. On the ride home from work yesterday I heard an NPR talk about the Supreme Court decisions being very close with this court. This got me wondering if there is a data base with this information and the journey began. This tutorial is purely exploratory but you will learn to:

  1. Grab .zip files from a data base and read into R
  2. Clean data
  3. Reshape data with reshape2
  4. Merge data sets
  5. Plot a choropleth map in ggplot2
  6. Arrange several grid plots with gridExtra

I'm lazy and like a good challenge. I challenged myself to not manually open a file so I downloaded Biobase from bioconductor to open the pdf files for the codebook. Also I used my own package qdap because it had some functions I like and I'm used to using them. This blog post was created in the dev. version of the reports package using the wordpress_rmd template.

Good R practice and an interesting view of Supreme Court cases.

June 26, 2013

Developing an Ontology of Legal Research

Filed under: Law,Law - Sources — Patrick Durusau @ 3:03 pm

Developing an Ontology of Legal Research by Amy Taylor.

From the post:

This session will describe my efforts to develop a legal ontology for teaching legal research. There are currently more than twenty legal ontologies worldwide that encompass legal knowledge, legal problem solving, legal drafting and information retrieval, and subjects such as IP, but no ontology of legal research. A legal research ontology could be useful because the transition from print to digital sources has shifted the way research is conducted and taught. Legal print sources have much of the structure of legal knowledge built into them (see the attached slide comparing screen shots from Westlaw and WestlawNext), so teaching students how to research in print also helps them learn the subject they are researching. With the shift to digital sources, this structure is now only implicit, and researchers must rely more upon a solid foundation in the structure of legal knowledge. The session will also describe my choice of OWL as the language that best meets the needs in building this ontology. The session will also explore the possibilities of representing this legal ontology in a more compact visual form to make it easier to incorporate into legal research instruction.

Plus slides and:

Leaving aside Amy’s choice of an ontology, OWL, etc., I would like to focus on her statement:

(…)
Legal print sources have much of the structure of legal knowledge built into them (see the attached slide comparing screen shots from Westlaw and WestlawNext), so teaching students how to research in print also helps them learn the subject they are researching. With the shift to digital sources, this structure is now only implicit, and researchers must rely more upon a solid foundation in the structure of legal knowledge.
(…)

First, Ann is comparing “Westlaw Classic,” and “WestlawNext,” both digital editions.

Second, the “structure” in question appeared in the “digests” published by West, for example:

digest

And in case head notes as:

head notes

That is the tradition of reporting structure in the digest and only isolated topics in case reports did not start with electronic versions.

That has been the organization of West materials since its beginning in the 19th century.

Third, an “ontology” of the law is quite a different undertaking from the “taxonomy” used by the West system.

The West American Digest System organized law reports to enable researchers to get “close enough” to relevant authorities.

That is the “last semantic mile” was up to the researcher, not the West system.

Even at that degree of coarseness in the West system, it was still an ongoing labor of decades by thousands of editors, and it remains so until today.

The amount of effort expended to obtain a coarse but useful taxonomy of the law should be a fair warning to anyone attempting an “ontology” of the same.

June 1, 2013

6 Goals for Public Access to Case Law

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 3:30 pm

6 Goals for Public Access to Case Law by Daniel Lewis and Nik Reed.

From the post:

In March, Mike Lissner wrote for this blog about the troubling state of access to case law – noting with dismay that most of the US corpus is not publicly available. While a few states make official cases available, most still do not, and neither does the federal government. At Ravel Law we’re building a new legal research platform and, like Mike, we’ve spent substantial time troubleshooting access to law issues. Here, we will provide some more detail about how official case law is created and share our recommendations for making it more available and usable. We focus in particular on FDsys – the federal judiciary’s effort in this space – but the ideas apply broadly.

(…)

Goal

Metrics

1. Comprehensive Access to Opinions – Does every federal court release every published and unpublished opinion?
– Are the electronic records comprehensive in their historic reach?
2. Opinions that can be Cited in Court – Are the official versions of cases provided, not just the slip opinions?
– And/or, can the version released by FDsys be cited in court?
3. Vendor-Neutral Citations – Are the opinions provided with a vendor-neutral citation (using, e.g., paragraph numbers)?
4. Opinions in File Formats that Enable Innovation – Are opinions provided in both human and machine-readable formats?
5. Opinions Marked with Meta-Data – Is a machine-readable language such as XML used to tag information like case date, title, citation, etc?
– Is additional markup of information such as sectional breaks, concurrences, etc. provided?
6. Bulk Access to Opinions – Are cases accessible via bulk access methods such as FTP or an API?

OK, but with the exception of bulk access, all of these issues have been solved (past tense) by commercial vendors.

Even bulk access is probably available if you are willing to pay the vendors enough.

But public access does not mean meaningful access.

For example, the goals mentioned above would not enable the average citizen to:

Which experts appear on behalf of which parties, consistently?

Which attorneys appear before particular judges?

What is a judge’s history with particular types of lawsuits?

What are the judge’s past connections with parties or attorneys?

To say nothing of what are the laws, facts, issues and other matters in a case, which are subject to varying identifications?

Public access to case law is a good goal, but not if it only eases the financial burden for existing legal publishers.

And does not provide the public with meaningful access to case law.

May 22, 2013

Integrating the US’ Documents

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 4:42 pm

Integrating the US’ Documents by Eric Mill.

From the post:

A few weeks ago, we integrated the full text of federal bills and regulations into our alert system, Scout. Now, if you visit CISPA or a fascinating cotton rule, you’ll see the original document – nicely formatted, but also well-integrated into Scout’s layout. There are a lot of good reasons to integrate the text this way: we want you to see why we alerted you to a document without having to jump off-site, and without clunky iframes.

As importantly, we wanted to do this in a way that would be easily reusable by other projects and people. So we built a tool called us-documents that makes it possible for anyone to do this with federal bills and regulations. It’s available as a Ruby gem, and comes with a command line tool so that you can use it with Python, Node, or any other language. It lives inside the unitedstates project at unitedstates/documents, and is entirely public domain.

This could prove to be real interesting. Both as a matter of content and a technique to replicate elsewhere.

I first saw this at: Mill: Integrating the US’s Documents.

May 21, 2013

Searching on BillTrack50

Filed under: Government,Law,Transparency — Patrick Durusau @ 6:58 am

How to find what you are looking for – constructing a search on BillTrack50 by Karen Suhaka.

From the post:

Building a search on BillTrack50 is fairly straightforward, however it isn’t exactly like doing a Google search. So there’s a few things you need to keep in mind, which I’ll explain in this post. There’s also a few tips and tricks advanced users might find useful. Any bills that are introduced later and meet your search terms will be automatically added to your bill sheet (if you made a bill sheet).

Tracking “thumb on the scale” (TOTS) at the state level? BillTrack50 is a great starting point.

BillTrack50 provides surface facts, to which you can add vote trading, influence peddling and other routine legislative activities.

May 20, 2013

FuzzyLaw [FuzzyDBA, FuzzyRDF, FuzzySW?]

Filed under: Law,Legal Informatics,Semantic Diversity,Semantic Inconsistency,Users — Patrick Durusau @ 2:03 pm

FuzzyLaw

From the webpage:

(…)

FuzzyLaw has gathered explanations of legal terms from members of the public in order to get a sense of what the ‘person on the street’ has in mind when they think of a legal term. By making lay-people’s explanations of legal terms available to interpreters, police and other legal professionals, we hope to stimulate debate and learning about word meaning, public understanding of law and the nature of explanation.

The explanations gathered in FuzzyLaw are unusual in that they are provided by members of the public. These people, all aged over 18, regard themselves as ‘native speakers’, ‘first language speakers’ and ‘mother tongue’ speakers of English and have lived in England and/or Wales for 10 years or more. We might therefore expect that they will understand English legal terminology as well as any member of the public might. No one who has contributed has ever worked in the criminal law system or as an interpreter or translator. They therefore bring no special expertise to the task of explanation, beyond whatever their daily life has provided.

We have gathered explanations for 37 words in total. You can see a sample of these explanations on FuzzyLaw. The sample of explanations is regularly updated. You can also read responses to the terms and the explanations from mainly interpreters, police officers and academics. You are warmly invited to add your own responses and join in the discussion of each and every word. Check back regularly to see how discussions develop and consider bookmarking the site for future visits. The site also contains commentaries on interesting phenomena which have emerged through the site. You can respond to the commentaries too on that page, contributing to the developing research project.

(…)

Have you ever wondered that the ‘person on the street’ thinks about relational databases, RDF or the Semantic Web?

Those are the folks who are being pushed content based on interpretations not their own making.

Here’s a work experiment for you:

  1. Take ten search terms from your local query log.
  2. At each department staff meeting, distribute sheets with the words, requesting everyone to define the terms in their own words. No wrong answers.
  3. Tally up the definitions per department and across the company.
  4. Comments anyone?

I first saw this at: FuzzyLaw: Collection of lay citizens’ understandings of legal terminology.

May 1, 2013

Taxonomies Make the Law. Will Folksonomies Change It?

Filed under: Folksonomy,Law,Taxonomy — Patrick Durusau @ 7:33 pm

Taxonomies Make the Law. Will Folksonomies Change It? by Serena Manzoli.

From the post:

Take a look at your bundle of tags on Delicious. Would you ever believe you’re going to change the law with a handful of them?

You’re going to change the way you research the law. The way you apply it. The way you teach it and, in doing so, shape the minds of future lawyers.

Do you think I’m going too far? Maybe.

But don’t overlook the way taxonomies have changed the law and shaped lawyers’ minds so far. Taxonomies? Yeah, taxonomies.

We, the lawyers, have used extensively taxonomies through the years; Civil lawyers in particular have shown to be particularly prone to them. We’ve used taxonomies for three reasons: to help legal research, to help memorization and teaching, and to apply the law.

Serena omits one reason lawyers use taxonomies: Knowledge of a taxonomy, particularly a complex one, confers power.

Legal taxonomies also exclude the vast majority of the population from meaningful engagement in public debates, much less decision making.

To be fair, some areas of the law are very complex, securities and tax law come to mind. Even without the taxonomy barrier, mastery is a difficult thing.

Serena’s example of navigable waters reminded me of one of my law professors who in separate cases, lost both sides of the question of navigability of a particular water way. 😉

I am hopeful that Serena is correct about the coming impact of folksonomies on the law.

But I am also mindful that legal “reform” rarely emerges from the gauntlet of privilege unscathed.

I first saw this at: Manzoli on Legal Taxonomies and Legal Folksonomies.

April 27, 2013

Bulk Access to Law-Related Linked Data:…

Filed under: Law,Legal Informatics — Patrick Durusau @ 4:23 pm

Bulk Access to Law-Related Linked Data: LC & VIAF Name Authority Records and LC Subject Authority Records

From the post:

Linked Data versions of Library of Congress name authority records and subject authority records are now available for bulk download from the Library of Congress Linked Data Service, according to Kevin Ford at Library of Congress.

In addition, VIAF, the Virtual International Authority File, now provides bulk access to Linked Data versions of name authority records for organizations, including government entities and business organizations, from more than 30 national or research libraries. VIAF data are also searchable through the VIAF Web user interface.

Always good to have more data but I would use caution with the Library of Congress authority records.

See for example, TFM (To Find Me) Mark Twain.

Authority record means just that, a record issued by an authority.

The state of being a “correct” records is something else entirely.

April 26, 2013

Once Under Wraps, Supreme Court Audio Trove Now Online

Filed under: Data,History,Law,Law - Sources — Patrick Durusau @ 3:09 pm

Once Under Wraps, Supreme Court Audio Trove Now Online

From the post:

On Wednesday, the U.S. Supreme Court heard oral arguments in the final cases of the term, which began last October and is expected to end in late June after high-profile rulings on gay marriage, affirmative action and the Voting Rights Act.

Audio from Wednesday’s arguments will be available at week’s end at the court’s website, but that’s a relatively new development at an institution that has historically been somewhat shuttered from public view.

The court has been releasing audio during the same week as arguments only since 2010. Before that, audio from one term generally wasn’t available until the beginning of the next term. But the court has been recording its arguments for nearly 60 years, at first only for the use of the justices and their law clerks, and eventually also for researchers at the National Archives, who could hear — but couldn’t duplicate — the tapes. As a result, until the 1990s, few in the public had ever heard recordings of the justices at work.

But as of just a few weeks ago, all of the archived historical audio — which dates back to 1955 — has been digitized, and almost all of those cases can now be heard and explored at an online archive called the Oyez Project.

A truly incredible resources for U.S. history in general and legal history in particular.

The transcripts and tapes are synchronized so your task, if you are interested, is to map these resources to other historical accounts and resources. 😉

The only disappointment is that the recordings begin with the October term of 1955. One of the most well known cases of the 20th century, Brown v. Board of Education, was argued in 1952 and re-argued in 1953. Hearing Thurgood Marshall argue that case would be a real treat.

I first saw this at: NPR: oyez.org finishes Supreme Court oral arguments project.

April 13, 2013

Linked Data and Law

Filed under: Law,Linked Data — Patrick Durusau @ 4:48 am

Linked Data and Law

A listing of linked data and law resources maintained by the Legal Informatics Blog.

Most recently updated to reflect the availability of the Library of Congress classification K – Class Law Classifcation as linked data.

« Newer PostsOlder Posts »

Powered by WordPress