50,000 Lessons on How to Read: a Relation Extraction Corpus by Dave Orr, Product Manager, Google Research.
From the post:
One of the most difficult tasks in NLP is called relation extraction. It’s an example of information extraction, one of the goals of natural language understanding. A relation is a semantic connection between (at least) two entities. For instance, you could say that Jim Henson was in a spouse relation with Jane Henson (and in a creator relation with many beloved characters and shows).
The goal of relation extraction is to learn relations from unstructured natural language text. The relations can be used to answer questions (“Who created Kermit?”), learn which proteins interact in the biomedical literature, or to build a database of hundreds of millions of entities and billions of relations to try and help people explore the world’s information.
To help researchers investigate relation extraction, we’re releasing a human-judged dataset of two relations about public figures on Wikipedia: nearly 10,000 examples of “place of birth”, and over 40,000 examples of “attended or graduated from an institution”. Each of these was judged by at least 5 raters, and can be used to train or evaluate relation extraction systems. We also plan to release more relations of new types in the coming months.
Another step in the “right” direction.
This is a human-curated set of relation semantics.
Rather than trying to apply this as a universal “standard,” what if you were to create a similar data set for your domain/enterprise?
Using human curators to create and maintain a set of relation semantics?
Being a topic mappish sort of person, I suggest the basis for their identification of the relationship be explicit, for robust re-use.
But you can repeat the same analysis over and over again if you prefer.