Archive for the ‘Word Meaning’ Category

Infinite Dimensional Word Embeddings [Variable Representation, Death to Triples]

Thursday, November 19th, 2015

Infinite Dimensional Word Embeddings by Eric Nalisnick and Sachin Ravi.


We describe a method for learning word embeddings with stochastic dimensionality. Our Infinite Skip-Gram (iSG) model specifies an energy-based joint distribution over a word vector, a context vector, and their dimensionality, which can be defined over a countably infinite domain by employing the same techniques used to make the Infinite Restricted Boltzmann Machine (Cote & Larochelle, 2015) tractable. We find that the distribution over embedding dimensionality for a given word is highly interpretable and leads to an elegant probabilistic mechanism for word sense induction. We show qualitatively and quantitatively that the iSG produces parameter-efficient representations that are robust to language’s inherent ambiguity.

Even better from the introduction:

To better capture the semantic variability of words, we propose a novel embedding method that produces vectors with stochastic dimensionality. By employing the same mathematical tools that allow the definition of an Infinite Restricted Boltzmann Machine (Côté & Larochelle, 2015), we describe ´a log-bilinear energy-based model–called the Infinite Skip-Gram (iSG) model–that defines a joint distribution over a word vector, a context vector, and their dimensionality, which has a countably infinite domain. During training, the iSGM allows word representations to grow naturally based on how well they can predict their context. This behavior enables the vectors of specific words to use few dimensions and the vectors of vague words to elongate as needed. Manual and experimental analysis reveals this dynamic representation elegantly captures specificity, polysemy, and homonymy without explicit definition of such concepts within the model. As far as we are aware, this is the first word embedding method that allows representation dimensionality to be variable and exhibit data-dependent growth.

Imagine a topic map model that “allow[ed] representation dimensionality to be variable and exhibit data-dependent growth.

Simple subjects, say the sort you find at, can have simple representations.

More complex subjects, say the notion of “person” in U.S. statutory law (no, I won’t attempt to list them here), can extend its dimensional representation as far as is necessary.

Of course in this case, the dimensions are learned from a corpus but I don’t see any barrier to the intentional creation of dimensions for subjects and/or a combined automatic/directed creation of dimensions.

Or as I put it in the title, Death to All Triples.

More precisely, not just triples but any pre-determined limit on representation.

Looking forward to taking a slow read on this article and those it cites. Very promising.

Underspecifying Meaning

Sunday, July 27th, 2014

Word Meanings Evolve to Selectively Preserve Distinctions on Salient Dimensions by Catriona Silvey, Simon Kirby, and Kenny Smith.


Words refer to objects in the world, but this correspondence is not one-to-one: Each word has a range of referents that share features on some dimensions but differ on others. This property of language is called underspecification. Parts of the lexicon have characteristic patterns of underspecification; for example, artifact nouns tend to specify shape, but not color, whereas substance nouns specify material but not shape. These regularities in the lexicon enable learners to generalize new words appropriately. How does the lexicon come to have these helpful regularities? We test the hypothesis that systematic backgrounding of some dimensions during learning and use causes language to gradually change, over repeated episodes of transmission, to produce a lexicon with strong patterns of underspecification across these less salient dimensions. This offers a cultural evolutionary mechanism linking individual word learning and generalization to the origin of regularities in the lexicon that help learners generalize words appropriately.

I can’t seem to access the article today but the premise is intriguing.

Perhaps people can have different “…less salient dimensions…” and therefore are generalizing words “inappropriately” from the standpoint of another person.

Curious if a test can be devised to identify those “…less salient dimensions…” in some target population? Might lead to faster identification of terms likely to be mis-understood.

Words as Tags?

Saturday, March 15th, 2014

Wordcounts are amazing. by Ted Underwood.

From the post:

People new to text mining are often disillusioned when they figure out how it’s actually done — which is still, in large part, by counting words. They’re willing to believe that computers have developed some clever strategy for finding patterns in language — but think “surely it’s something better than that?“

Uneasiness with mere word-counting remains strong even in researchers familiar with statistical methods, and makes us search restlessly for something better than “words” on which to apply them. Maybe if we stemmed words to make them more like concepts? Or parsed sentences? In my case, this impulse made me spend a lot of time mining two- and three-word phrases. Nothing wrong with any of that. These are all good ideas, but they may not be quite as essential as we imagine.

Working with text is like working with a video where every element of every frame has already been tagged, not only with nouns but with attributes and actions. If we actually had those tags on an actual video collection, I think we’d recognize it as an enormously valuable archive. The opportunities for statistical analysis are obvious! We have trouble recognizing the same opportunities when they present themselves in text, because we take the strengths of text for granted and only notice what gets lost in the analysis. So we ignore all those free tags on every page and ask ourselves, “How will we know which tags are connected? And how will we know which clauses are subjunctive?”

What a delightful insight!

When we say text is “unstructured” what we really mean is something as dumb as a computer sees no structure in the text.

A human reader, even a 5 or 6 year old reader of a text sees lots of structure, meaning too.

Rather than trying to “teach” computers to read, perhaps we should use computers to facilitate reading by those who already can.


I first saw this in a tweet by Matthew Brook O’Donnell.

TSDW:… [Enterprise Disambiguation]

Monday, April 22nd, 2013

TSDW: Two-stage word sense disambiguation using Wikipedia by Chenliang Li, Aixin Sun, Anwitaman Datta. (Li, C., Sun, A. and Datta, A. (2013), TSDW: Two-stage word sense disambiguation using Wikipedia. J. Am. Soc. Inf. Sci.. doi: 10.1002/asi.22829)


The semantic knowledge of Wikipedia has proved to be useful for many tasks, for example, named entity disambiguation. Among these applications, the task of identifying the word sense based on Wikipedia is a crucial component because the output of this component is often used in subsequent tasks. In this article, we present a two-stage framework (called TSDW) for word sense disambiguation using knowledge latent in Wikipedia. The disambiguation of a given phrase is applied through a two-stage disambiguation process: (a) The first-stage disambiguation explores the contextual semantic information, where the noisy information is pruned for better effectiveness and efficiency; and (b) the second-stage disambiguation explores the disambiguated phrases of high confidence from the first stage to achieve better redisambiguation decisions for the phrases that are difficult to disambiguate in the first stage. Moreover, existing studies have addressed the disambiguation problem for English text only. Considering the popular usage of Wikipedia in different languages, we study the performance of TSDW and the existing state-of-the-art approaches over both English and Traditional Chinese articles. The experimental results show that TSDW generalizes well to different semantic relatedness measures and text in different languages. More important, TSDW significantly outperforms the state-of-the-art approaches with both better effectiveness and efficiency.

TSDW works because Wikipedia is a source of unambiguous phrases, that can also be used to disambiguate phrases that one first pass are not unambiguous.

But Wikipedia did not always exist and was built out of the collaboration of thousands of users over time.

Does that offer a clue as to building better search tools for enterprise data?

What if statistically improbable phrases are mined from new enterprise documents and links created to definitions for those phrases?

Thinking picking a current starting point avoids a “…boil the ocean…” scenario before benefits can be shown.

Current content is also more likely to be a search target.

Domain expertise and literacy required.

Expertise in logic or ontologies not.

Kwong – … Word Sense Disambiguation

Tuesday, January 29th, 2013

New Perspectives on Computational and Cognitive Strategies for Word Sense Disambiguation
by Oi Yee Kwong.

From the description:

Cognitive and Computational Strategies for Word Sense Disambiguation examines cognitive strategies by humans and computational strategies by machines, for WSD in parallel.

Focusing on a psychologically valid property of words and senses, author Oi Yee Kwong discusses their concreteness or abstractness and draws on psycholinguistic data to examine the extent to which existing lexical resources resemble the mental lexicon as far as the concreteness distinction is concerned. The text also investigates the contribution of different knowledge sources to WSD in relation to this very intrinsic nature of words and senses.

I wasn’t aware that the “mental lexicon” of words had been fully described.

Shows what you can learn from reading marketing summaries of research.

If you are in Kolkata/Pune, India…a request.

Tuesday, July 17th, 2012

No emails are given for the authors of: Identify Web-page Content meaning using Knowledge based System for Dual Meaning Words but their locations were listed as Kolkata and Pune, India. I would appreciate your pointing the authors to this blog as one source of information on topic maps.

The authors have re-invented a small part of topic maps to deal with synonymy using XSD syntax. Quite doable but I think they would be better served by either using topic maps or engaging in improving topic maps.

Reinvention is rarely a step forward.


Meaning of Web-page content plays a big role while produced a search result from a search engine. Most of the cases Web-page meaning stored in title or meta-tag area but those meanings do not always match with Web-page content. To overcome this situation we need to go through the Web-page content to identify the Web-page meaning. In such cases, where Webpage content holds dual meaning words that time it is really difficult to identify the meaning of the Web-page. In this paper, we are introducing a new design and development mechanism of identifying the Web-page content meaning which holds dual meaning words in their Web-page content.

From Words to Concepts and Back: Dictionaries for Linking Text, Entities and Ideas

Friday, May 18th, 2012

From Words to Concepts and Back: Dictionaries for Linking Text, Entities and Ideas by Valentin Spitkovsky and Peter Norvig (Google Research Team).

From the post:

Human language is both rich and ambiguous. When we hear or read words, we resolve meanings to mental representations, for example recognizing and linking names to the intended persons, locations or organizations. Bridging words and meaning — from turning search queries into relevant results to suggesting targeted keywords for advertisers — is also Google’s core competency, and important for many other tasks in information retrieval and natural language processing. We are happy to release a resource, spanning 7,560,141 concepts and 175,100,788 unique text strings, that we hope will help everyone working in these areas.

How do we represent concepts? Our approach piggybacks on the unique titles of entries from an encyclopedia, which are mostly proper and common noun phrases. We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web. For every English article we harvested the strings associated with its incoming hyperlinks from the rest of Wikipedia, the greater web, and also anchors of parallel, non-English Wikipedia pages. Our dictionaries are cross-lingual, and any concept deemed too fine can be broadened to a desired level of generality using Wikipedia’s groupings of articles into hierarchical categories.

(examples omitted)

The database that we are providing was designed for recall. It is large and noisy, incorporating 297,073,139 distinct string-concept pairs, aggregated over 3,152,091,432 individual links, many of them referencing non-existent articles. For technical details, see our paper (to be presented at LREC 2012) and the README file accompanying the data. (emphasis added)

Did you catch those numbers?

Now there is a truly remarkable resource.

What will you make out of it?

Representing word meaning and order information in a composite holographic lexicon

Saturday, November 19th, 2011

Representing word meaning and order information in a composite holographic lexicon by Michael N. Jones , Douglas J. K. Mewhort.


The authors present a computational model that builds a holographic lexicon representing both word meaning and word order from unsupervised experience with natural language. The model uses simple convolution and superposition mechanisms (cf. B. B. Murdock, 1982) to learn distributed holographic representations for words. The structure of the resulting lexicon can account for empirical data from classic experiments studying semantic typicality, categorization, priming, and semantic constraint in sentence completions. Furthermore, order information can be retrieved from the holographic representations, allowing the model to account for limited word transitions without the need for built-in transition rules. The model demonstrates that a broad range of psychological data can be accounted for directly from the structure of lexical representations learned in this way, without the need for complexity to be built into either the processing mechanisms or the representations. The holographic representations are an appropriate knowledge representation to be used by higher order models of language comprehension, relieving the complexity required at the higher level.

More reading along the lines of higher-dimensional representation techniques. Almost six (6) pages of references to run backwards and forwards so this is going to take a while.