Archive for the ‘Distributional Semantics’ Category

Distributional Semantics

Tuesday, August 16th, 2011

Distributional Semantics.

Trevor Cohen’s, co-author with Roger Schvaneveldt, and Dominic Widdows of Reflective Random Indexing and indirect inference…, page on distributional semantics which starts with:

Empirical Distributional Semantics is an emerging discipline that is primarily concerned with the derivation of semantics (or meaning) from the distribution of terms in natural language text. My research in DS is concerned primarily with spatial models of meaning, in which terms are projected into high-dimensional semantic space, and an estimate of their semantic relatedness is derived from the distance between them in this space.

The relations derived by these models have many useful applications in biomedicine and beyond. A particularly interesting property of distributional semantics models is their capacity to recognize connections between terms that do not occur together in the same document, as this has implications for knowledge discovery. In many instances it is possible also to reveal a plausible pathway linking these terms by using the distances estimated by distributional semantic models to generate a network representation, and using Pathfinder networks (PFNETS) to reveal the most significant links in this network, as shown in the example below:

Links to projects, software and other cool stuff! Making a separate post on one of his software libraries.

Reflective Random Indexing and indirect inference…

Tuesday, August 16th, 2011

Reflective Random Indexing and indirect inference: A scalable method for discovery of implicit connections by Trevor Cohen, Roger Schvaneveldt, Dominic Widdows.


The discovery of implicit connections between terms that do not occur together in any scientific document underlies the model of literature-based knowledge discovery first proposed by Swanson. Corpus-derived statistical models of semantic distance such as Latent Semantic Analysis (LSA) have been evaluated previously as methods for the discovery of such implicit connections. However, LSA in particular is dependent on a computationally demanding method of dimension reduction as a means to obtain meaningful indirect inference, limiting its ability to scale to large text corpora. In this paper, we evaluate the ability of Random Indexing (RI), a scalable distributional model of word associations, to draw meaningful implicit relationships between terms in general and biomedical language. Proponents of this method have achieved comparable performance to LSA on several cognitive tasks while using a simpler and less computationally demanding method of dimension reduction than LSA employs. In this paper, we demonstrate that the original implementation of RI is ineffective at inferring meaningful indirect connections, and evaluate Reflective Random Indexing (RRI), an iterative variant of the method that is better able to perform indirect inference. RRI is shown to lead to more clearly related indirect connections and to outperform existing RI implementations in the prediction of future direct co-occurrence in the MEDLINE corpus.

The term “direct inference” is used for establishing a relationship between terms with a shared “bridging” term. That is the terms don’t co-occur in a text but share a third term that occurs in both texts. “Indirect inference,” that is finding terms with no shared “bridging” term is the focus of this paper.

BTW, if you don’t have access to the Journal of Biomedical Informatics version, try the draft: Reflective Random Indexing and indirect inference: A scalable method for discovery of implicit connections