Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 16, 2012

Data Curation in the Networked Humanities [Semantic Curation?]

Filed under: Curation,Humanities,Literature — Patrick Durusau @ 4:29 am

Data Curation in the Networked Humanities by Michael Ullyot.

From the post:

These talks are the first phase of Encoding Shakespeare: my SSHRC-funded project for the next three years. Between now and 2015, I’m working to improve the automated encoding of early modern English texts, to enable text analysis.

This post’s three parts are brought to you by the letter p. First I outline the potential of algorithmic text analysis; then the problem of messy data; and finally the protocols for a networked-humanities data curation system.

This third part is the most tentative, as of this writing; Fall 2012 is about defining my protocols and identifying which tags the most text-analysis engines require for the best results — whatever that entails. (So I welcome your comments and resource links.)

A project that promises to touch on many of the issues in modern digital humanities. Do review and contribute if possible.

I have a lingering uneasiness with the notion of “data curation.” With the data and not curation part.

To say “data curation” implies we can identify the “data” that merits curation.

I don’t doubt we can identify some data that needs curation. The question being is it the only data that merits curation?

We know from the early textual history of the Bible that the text was curated and in that process, variant traditions and entire works were lost.

Just my take on it but rather than “data curation,” with the implication of a “correct” text, we need semantic curation.

Semantic curation attempts to preserve the semantics we see in a text, without attempting to find the correct semantics.

July 25, 2012

The Case for Curation: The Relevance of Digest and Citator Results in Westlaw and Lexis

Filed under: Aggregation,Curation,Legal Informatics,LexisNexis,Westlaw — Patrick Durusau @ 6:51 pm

The Case for Curation: The Relevance of Digest and Citator Results in Westlaw and Lexis by Susan Nevelow Mart and Jeffrey Luftig.

Abstract:

Humans and machines are both involved in the creation of legal research resources. For legal information retrieval systems, the human-curated finding aid is being overtaken by the computer algorithm. But human-curated finding aids still exist. One of them is the West Key Number system. The Key Number system’s headnote classification of case law, started back in the nineteenth century, was and is the creation of humans. The retrospective headnote classification of the cases in Lexis’s case databases, started in 1999, was created primarily although not exclusively with computer algorithms. So how do these two very different systems deal with a similar headnote from the same case, when they link the headnote to the digesting and citator functions in their respective databases? This paper continues an investigation into this question, looking at the relevance of results from digest and citator search run on matching headnotes in ninety important federal and state cases, to see how each performs. For digests, where the results are curated – where a human has made a judgment about the meaning of a case and placed it in a classification system – humans still have an advantage. For citators, where algorithm is battling algorithm to find relevant results, it is a matter of the better algorithm winning. But no one algorithm is doing a very good job of finding all the relevant results; the overlap between the two citator systems is not that large. The lesson for researchers: know how your legal research system was created, what involvement, if any, humans had in the curation of the system, and what a researcher can and cannot expect from the system you are using.

A must read for library students and legal researchers.

For legal research, the authors conclude:

The intervention of humans as curators in online environments is being recognized as a way to add value to an algorithm’s results, in legal research tools as well as web-based applications in other areas. Humans still have an edge in predicting which cases are relevant. And the intersection of human curation and algorithmically-generated data sets is already well underway. More curation will improve the quality of results in legal research tools, and most particularly can be used to address the algorithmic deficit that still seems to exist where analogic reasoning is needed. So for legal research, there is a case for curation. [footnotes omitted]

The distinction between curation, human gathering of relevant material and aggregation, machine gathering of potentially relevant material looks quite useful.

Curation anyone?

I first saw this at Legal Informatics.

June 13, 2012

Network of data visualization references

Filed under: BigData,Curation,Graphics,Visualization — Patrick Durusau @ 9:57 am

Network of data visualization references by Nathan Yau.

From the post:

Developer Santiago Ortiz places Delicious tags for visualization references in a discovery context. There are two views. The first is a network view with tags and resources as nodes. A fisheye effect is used to zoom in on nodes and make the more readable. Mouse over a tag, and the labels for related resources get bigger, and likewise, mouse over a resource, and the related tags get bigger.

The second view lets you compare resources. In the network view, select two ore more resources, and then click on the bottom button to compare the selected.

On the left hand side, top, you will see:

  • blogs
  • studios
  • people
  • tools
  • books

I had to select one of those before getting the option to switch to the second view.

The graph view seems to move too quickly but that may just be me.

I am sure there is a “big data” view of visualization but I find this more limited view quite useful.

As a matter of fact, I suspect finding sub-communities that share semantics is going to be more of a growth area than “big data.” To be sure, you may start with “big data” but you will quickly boil it down to “small data” that is both useful and relevant to your user community.

Small enough for machine-assisted curation no doubt. Where the curation is the value-add resulting in a product.

May 9, 2012

GATE Teamware: Collaborative Annotation Factories (HOT!)

GATE Teamware: Collaborative Annotation Factories

From the webpage:

Teamware is a web-based management platform for collaborative annotation & curation. It is a cost-effective environment for annotation and curation projects, enabling you to harness a broadly distributed workforce and monitor progress & results remotely in real time.

It’s also very easy to use. A new project can be up and running in less than five minutes. (As far as we know, there is nothing else like it in this field.)

GATE Teamware delivers a multi-function user interface over the Internet for viewing, adding and editing text annotations. The web-based management interface allows for project set-up, tracking, and management:

  • Loading document collections (a “corpus” or “corpora”)
  • Creating re-usable project templates
  • Initiating projects based on templates
  • Assigning project roles to specific users
  • Monitoring progress and various project statistics in real time
  • Reporting of project status, annotator activity and statistics
  • Applying GATE-based processing routines (automatic annotations or post-annotation processing)

I have known about the GATE project in general for years and came to this site after reading: Crowdsourced Legal Case Annotation.

Could be the basis for annotations that are converted into a topic map, but…, I have been a sysadmin before. Maintaining servers, websites, software, etc. Great work, interesting work, but not what I want to be doing now.

Then I read:

Where to get it? The easiest way to get started is to buy a ready-to-run Teamware virtual server from GATECloud.net.

Not saying it will or won’t meet your particular needs, but, certainly is worth a “look see.”

Let me know if you take the plunge!

April 13, 2011

7th international Digital Curation Conference

Filed under: Conferences,Curation — Patrick Durusau @ 1:23 pm

7th international Digital Curation Conference

Call for papers where topics include:

  • Lessons learned from the inter-disciplinary use of open data: examples of enablers, barriers and success stories
  • Curation of mixed data collections, with open and sensitive or private content
  • Gathering evidence for benefits of data sharing
  • Building capacity for the effective management, sharing and reuse of open data
  • Scale issues in the management of sensitive data
  • Tensions between maintaining quality and openness
  • Linked data, open data, closed data and provenance
  • Technical and organisational solutions for data security
  • Developing new metrics for open data
  • Ethical issues and personal data
  • Legislation and open data

Submission deadline: 25 July 2011

Conference:

5 – 7 December 2011
Marriott Royal Hotel, Bristol, UK

« Newer Posts

Powered by WordPress