From the “about” page:
The Groningen Meaning Bank consists of public domain English texts with corresponding syntactic and semantic representations.
Key features
The GMB supports deep semantics, opening the way to theoretically grounded, data-driven approaches to computational semantics. It integrates phenomena instead of covering single phenomena in isolation. This provides a better handle on explaining dependencies between various ambiguous linguistic phenomena, including word senses, thematic roles, quantifier scrope, tense and aspect, anaphora, presupposition, and rhetorical relations. In the GMB texts are annotated rather than
isolated sentences, which provides a means to deal with ambiguities on the sentence level that require discourse context for resolving them.Method
The GMB is being built using a bootstrapping approach. We employ state-of-the-art NLP tools (notably the C&C tools and Boxer) to produce a reasonable approximation to gold-standard annotations. From release to release, the annotations are corrected and refined using human annotations coming from two main sources: experts who directly edit the annotations in the GMB via the Explorer, and non-experts who play a game with a purpose called Wordrobe.
Theoretical background
The theoretical backbone for the semantic annotations in the GMB is established by Discourse Representation Theory (DRT), a formal theory of meaning developed by the philosopher of language Hans Kamp (Kamp, 1981; Kamp and Reyle, 1993). Extensions of the theory bridge the gap between theory and practice. In particular, we use VerbNet for thematic roles, a variation on ACE‘s named entity classification, WordNet for word senses and Segmented DRT for rhetorical relations (Asher and Lascarides, 2003). Thanks to the DRT backbone, all these linguistic phenomena can be expressed in a first-order language, enabling the practical use of first-order theorem provers and model builders.
Step back towards the source of semantics (that would be us).
One practical question is how to capture semantics for a particular domain or enterprise?
Another is what to capture to enable the mapping of those semantics to those of other domains or enterprises?