The Simpsons by the Data [South Park as well]

The Simpsons by the Data by Todd Schneider.

From the post:

The Simpsons needs no introduction. At 27 seasons and counting, it’s the longest-running scripted series in the history of American primetime television.

The show’s longevity, and the fact that it’s animated, provides a vast and relatively unchanging universe of characters to study. It’s easier for an animated show to scale to hundreds of recurring characters; without live-action actors to grow old or move on to other projects, the denizens of Springfield remain mostly unchanged from year to year.

As a fan of the show, I present a few short analyses about Springfield, from the show’s dialogue to its TV ratings. All code used for this post is available on GitHub.

Alert! You must run Flash in order to access Simpsons World, the source of Todd’s data.

Advice: Treat Flash as malware and run in a VM.

Todd covers the number of words spoken per character, gender imbalance, focus on characters, viewership, and episode summaries (tf-idf).

Other analysis awaits your imagination and interest.

BTW, if you want comedy data a bit closer to the edge, try Text Mining South Park by Kaylin Walker. Kaylin uses R for her analysis as well.

Other TV programs with R-powered analysis?

Comments are closed.