King – Man + Woman = Queen: The Marvelous Mathematics of Computational Linguistics.

From the post:

Computational linguistics has dramatically changed the way researchers study and understand language. The ability to number-crunch huge amounts of words for the first time has led to entirely new ways of thinking about words and their relationship to one another.

This number-crunching shows exactly how often a word appears close to other words, an important factor in how they are used. So the word Olympics might appear close to words like running, jumping, and throwing but less often next to words like electron or stegosaurus. This set of relationships can be thought of as a multidimensional vector that describes how the word Olympics is used within a language, which itself can be thought of as a vector space.

And therein lies this massive change. This new approach allows languages to be treated like vector spaces with precise mathematical properties. Now the study of language is becoming a problem of vector space mathematics.

Today, Timothy Baldwin at the University of Melbourne in Australia and a few pals explore one of the curious mathematical properties of this vector space: that adding and subtracting vectors produces another vector in the same space.

The question they address is this: what do these composite vectors mean? And in exploring this question they find that the difference between vectors is a powerful tool for studying language and the relationship between words.

…

A great lay introduction to:

Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning by Ekaterina Vylomova, Laura Rimell, Trevor Cohn, Timothy Baldwin.

Abstract:

Recent work on word embeddings has shown that simple vector subtraction over pre-trained embeddings is surprisingly effective at capturing different lexical relations, despite lacking explicit supervision. Prior work has evaluated this intriguing result using a word analogy prediction formulation and hand-selected relations, but the generality of the finding over a broader range of lexical relation types and different learning settings has not been evaluated. In this paper, we carry out such an evaluation in two learning settings: (1) spectral clustering to induce word relations, and (2) supervised learning to classify vector differences into relation types. We find that word embeddings capture a surprising amount of information, and that, under suitable supervised training, vector subtraction generalises well to a broad range of relations, including over unseen lexical items.

The authors readily admit, much to their credit, this isn’t a one size fits all solution.

But, a line of research that merits your attention.