Archive for the ‘Relation Extraction’ Category

50,000 Lessons on How to Read:…

Friday, April 12th, 2013

50,000 Lessons on How to Read: a Relation Extraction Corpus by Dave Orr, Product Manager, Google Research.

From the post:

One of the most difficult tasks in NLP is called relation extraction. It’s an example of information extraction, one of the goals of natural language understanding. A relation is a semantic connection between (at least) two entities. For instance, you could say that Jim Henson was in a spouse relation with Jane Henson (and in a creator relation with many beloved characters and shows).

The goal of relation extraction is to learn relations from unstructured natural language text. The relations can be used to answer questions (“Who created Kermit?”), learn which proteins interact in the biomedical literature, or to build a database of hundreds of millions of entities and billions of relations to try and help people explore the world’s information.

To help researchers investigate relation extraction, we’re releasing a human-judged dataset of two relations about public figures on Wikipedia: nearly 10,000 examples of “place of birth”, and over 40,000 examples of “attended or graduated from an institution”. Each of these was judged by at least 5 raters, and can be used to train or evaluate relation extraction systems. We also plan to release more relations of new types in the coming months.

Another step in the “right” direction.

This is a human-curated set of relation semantics.

Rather than trying to apply this as a universal “standard,” what if you were to create a similar data set for your domain/enterprise?

Using human curators to create and maintain a set of relation semantics?

Being a topic mappish sort of person, I suggest the basis for their identification of the relationship be explicit, for robust re-use.

But you can repeat the same analysis over and over again if you prefer.

Graph Databases: Information Silo Busters

Wednesday, February 29th, 2012

In a post about InfiniteGraph 2.1 I found the following:

Other big data solutions all lack one thing, Clark contends. There is no easy way to represent the connection information, the relationships across the different silos of data or different data stores, he says. “That is where Objectivity can provide the enhanced storage for actually helping extract and persist those relationships so you can then ask queries about how things are connected.”

(Brian Clark, vice president, Data Management, Objectivity)

It was the last line of the post but I would have sharpened it and made it the lead slug.

Think about what Clark is saying: Not only can we persist relationship information within a datastore but also generate and persist relationship information between datastores. With no restriction on the nature of the datastores.

Try doing that with a relational database and SQL.

What I find particularly attractive is that persisting relationships across datastores means that we can jump the hurdle of making everyone use a common data model. It can be as common (in the graph) as it needs to be and no more.

Of course I think about this as being particularly suited for topic maps as we can document why we have mapped components of diverse data models to particular points in the graph but what did you expect?

But used robustly, graph databases are going to allow you to perform integration across whatever datastores are available to you, using whatever data models they use, and mapped to whatever data model you like. As others may map your graph database to models they prefer as well.

I think the need for documenting those mappings is one that needs attention sooner rather than later.

BTW, feel free to use the phrase “Graph Databases: Information Silo Busters.” (with or without attribution – I want information silos to fall more than I want personal recognition.)

Entities, Relationships, and Semantics: the State of Structured Search

Saturday, November 12th, 2011

Entities, Relationships, and Semantics: the State of Structured Search

Jeff Dalton’s notes on a panel discussion moderated by Daniel Tunkelang. The panel consisted of Andrew Hogue (Google NY), Breck Baldwin (alias-i), Evan Sandhause (NY Times), and Wlodek Zadrozny (IBM. Watson).

Read the notes, watch the discussion.

BTW, Sandhause (New York Times) points out that librarians have been working with structured data for a very long time.

So, libraries want to be more like web search engines and the folks building search engines want to be more like libraries.

Sounds to me like both communities need to spend more time reading each others blogs, cross-attending conferences, etc.

T-Rex Information Extraction

Friday, October 15th, 2010

T-Rex (Trainable Relation Extraction).

Tools for document classification, entity and relation (read association) extraction.

Topic maps of any size are going to be constructed from mining of “data” and in a lot of cases that will mean “documents” (to the extent that is a meaningful distinction).

Interesting toolkit for that purpose but apparently not being maintained. Parked at Sourceforge after having been funded by the EU.

Does anyone have a status update on this project?