Archive for the ‘Word Cloud’ Category

Building a Better Word Cloud

Friday, December 12th, 2014

Building a Better Word Cloud by Drew Conway.

From the post:

A few weeks ago I attended the NYC Data Visualization and Infographics meetup, which included a talk by Junk Charts blogger Kaiser Fung. Given the topic of his blog, I was a bit shocked that the central theme of his talk was comparing good and bad word clouds. He even stated that the word cloud was one of the best data visualizations of the last several years. I do not think there is such a thing as a good word cloud, and after the meetup I left unconvinced; as evidenced by the above tweet.

This tweet precipitated a brief Twitter debate about the value of word clouds, but from that straw poll it seemed the Nays had the majority. My primary gripe is that space is meaningless in word clouds. They are meant to summarize a single statistics—word frequency—yet they use a two dimensional space to express that. This is frustrating, since it is very easy to abuse the flexibility of these dimensions and conflate the position of a word with its frequency to convey dubious significance.

This came up on Twitter today even though Drew’s post dates from 2011. Great post though as Drew tries to improve upon the standard word cloud.

Not Drew’s fault but after reading his post I am where he was at the beginning on word clouds, I don’t see their utility. Perhaps your experience will be different.

Visualizing Philosophers And Scientists

Tuesday, July 1st, 2014

Visualizing Philosophers And Scientists By The Words They Used With Python and d3.js by Sahand Saba.

From the post:

This is a rather short post on a little fun project I did a couple of weekends ago. The purpose was mostly to demonstrate how easy it is to process and visualize large amounts of data using Python and d3.js.

With the goal of visualizing the words that were most associated with a given scientist or philosopher, I downloaded a variety of science and philosophy books that are in the public domain (project Gutenberg, more specifically), and processed them using Python (scikit-learn and nltk), then used d3.js and d3.js cloud by Jason Davies ( to visualize the words most frequently used by the authors. To make it more interesting, only words that are somewhat unique to the author are displayed (i.e. if a word is used frequently by all authors then it is likely not that interesting and is dropped from the results). This can be easily achieved using the max_df parameter of the CountVectorizer class.

I pass by Copleston’s A History of Philosophy several times a day. It is a paperback edition from many years ago that I keep meaning to re-read.

At least for philosophers with enough surviving texts in machine readable format, perhaps Sahand’s post will provide the incentive to return to reading Copleston. A word cloud is one way to explore a text. Commentary, such as Copleston’s, is another.

What other tools would you use with philosophers and a commentary like Copleston?

I first saw this in a tweet by Christophe Viau.

Word Storms:…

Monday, February 24th, 2014

Word Storms: Multiples of Word Clouds for Visual Comparison of Documents by Quim Castellà and Charles Sutton.


Word clouds are popular for visualizing documents, but are not as useful for comparing documents, because identical words are not presented consistently across different clouds. We introduce the concept of word storms, a visualization tool for analyzing corpora of documents. A word storm is a group of word clouds, in which each cloud represents a single document, juxtaposed to allow the viewer to compare and contrast the documents. We present a novel algorithm that creates a coordinated word storm, in which words that appear in multiple documents are placed in the same location, using the same color and orientation, across clouds. This ensures that similar documents are represented by similar- looking word clouds, making them easier to compare and contrast visually. We evaluate the algorithm using an automatic evaluation based on document classifi cation, and a user study. The results con rm that a coordinated word storm allows for better visual comparison of documents.

I never have cared for word clouds all that much but word storms as presented by the authors looks quite useful.

The paper examines the use of word storms at a corpus, document and single document level.

You will find Word Storms: Multiples of Word Clouds for Visual Comparison of Documents (website) of particular interest, including its like to Github for the source code used in this project.

Of particular interests for topic mappers is the observation:

similar documents should be represented by visually similar clouds (emphasis in original)

Now imagine for a moment visualizing topics and associations with “similar” appearances. Even if limited to colors that are easy to distinguish, that could be a very powerful display/discover tool for topic maps.

Not the paper’s use case but one that comes to mind with regard to display/discovery in a heterogeneous data set (such as a corpus of documents).

A Wordcloud in Python

Saturday, November 17th, 2012

A Wordcloud in Python by Andreas Mueller.

From the post:

Last week I was at Pycon DE, the German Python conference. After hacking on scikit-learn a lot last week, I decided to to something different on my way back, that I had planned for quite a while:
doing a wordl-like word cloud.

I know, word clouds are a bit out of style but I kind of like them any way. My motivation to think about word clouds was that I thought these could be combined with topic-models to give somewhat more interesting visualizations.

So I looked around to find a nice open-source implementation of word-clouds … only to find none. (This has been a while, maybe it has changed since).

“Andy” walks through the construction of a word cloud in Python.

Looking at his renderings, I think I know why I don’t appreciate word clouds as much as they deserve.

I am trying to “read” the words as text, not observing them in unknown relationships to each other.

Word clouds may work for you or your users and if they do, use them.

But be aware there are users who find them nearly useless.

Twitter Words Association Analysis

Saturday, July 28th, 2012

Twitter Words Association Analysis by Gunjan Amit.

From the post:

Recently I came across Twitter Spectrum tool from Jeff Clerk. This tool is modified version of News Spectrum tool.

Here you can enter two topics and then analyse the associated words based on twitter data. Blue and Red color represents the associated words of those two topics whereas Purple represents the common words.

You can click on any word to see the related tweets. The visualization is really awesome and you can easily analyze the data.

For example, I have taken “icici” and “hdfc” as two topics. Below is the twitter spectrum based on these two topics:

Looks interesting as a “rough cut” or exploratory tool.

Author Wordle™

Thursday, November 17th, 2011

Author Wordle™

From the SciVerse description:

The Author Wordle™ application lets you create a Wordle word cloud out of the titles of the last 100 papers from any author in Scopus.

Wordle is a toy for generating word clouds from text. The clouds give greater prominence to words that appear more frequently in the source text. Clouds can be tweaked with different fonts, layouts, and color schemes. The images created with Wordle are yours to use however you like. You can print them out, or save them to the Wordle gallery to share with your friends. Authors can use these Wordle word clouds on their own website as a representation of their research, or just for fun.

Can’t say I am a big fan of word clouds but a lot of people find them quite useful. See how it works for you in evaluating recent work by a particular author.

Word Cloud in R

Friday, July 29th, 2011

Word Cloud in R

From the post:

A word cloud (or tag cloud) can be an handy tool when you need to highlight the most commonly cited words in a text using a quick visualization. Of course, you can use one of the several on-line services, such as wordle or tagxedo , very feature rich and with a nice GUI. Being an R enthusiast, I always wanted to produce this kind of images within R and now, thanks to the recently released Ian Fellows’ wordcloud package, finally I can!

In order to test the package I retrieved the titles of the XKCD web comics included in my RXKCD package and produced a word cloud based on the titles’ word frequencies calculated using the powerful tm package for text mining (I know, it is like killing a fly with a bazooka!).

I don’t care for word clouds but some people find them very useful. They certainly are an option to consider when offering your users views into texts.

Follow the pointers in this article to some of the on-line services or tweak your own in R.