Automating Family/Party Feud

Semantic Analysis of the Reddit Hivemind

From the webpage:

Our neural network read every comment posted to Reddit in 2015, and built a semantic map using word2vec and spaCy.

Try searching for a phrase that’s more than the sum of its parts to see what the model thinks it means. Try your favourite band, slang words, technical things, or something totally random.

Lynn Cherny suggested in a tweet to use “actually.”

If you are interested in the background on this tool, see: Sense2vec with spaCy and Gensim by Matthew Honnibal.

From the post:

If you were doing text analytics in 2015, you were probably using word2vec. Sense2vec (Trask et al., 2015) is a new twist on word2vec that lets you learn more interesting, detailed and context-sensitive word vectors. This post motivates the idea, explains our implementation, and comes with an interactive demo that we’ve found surprisingly addictive.

Polysemy: the problem with word2vec

When humans write dictionaries and thesauruses, we define concepts in relation to other concepts. For automatic natural language processing, it’s often more effective to use dictionaries that define concepts in terms of their usage statistics. The word2vec family of models are the most popular way of creating these dictionaries. Given a large sample of text, word2vec gives you a dictionary where each definition is just a row of, say, 300 floating-point numbers. To find out whether two entries in the dictionary are similar, you ask how similar their definitions are – a well-defined mathematical operation.

Certain to be a hit at technical conferences and parties.

SGML wasn’t mentioned even once during 2015 in Reddit Comments.

Try some your favorites words and phrases.

Enjoy!

Comments are closed.