Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 12, 2011

Be careful with dictionary-based text analysis

Filed under: Dictionary,Semantics,Sentiment Analysis — Patrick Durusau @ 4:39 pm

Be careful with dictionary-based text analysis

Brendan O’Connor writes:

OK, everyone loves to run dictionary methods for sentiment and other text analysis — counting words from a predefined lexicon in a big corpus, in order to explore or test hypotheses about the corpus. In particular, this is often done for sentiment analysis: count positive and negative words (according to a sentiment polarity lexicon, which was derived from human raters or previous researchers’ intuitions), and then proclaim the output yields sentiment levels of the documents. More and more papers come out every day that do this. I’ve done this myself. It’s interesting and fun, but it’s easy to get a bunch of meaningless numbers if you don’t carefully validate what’s going on. There are certainly good studies in this area that do further validation and analysis, but it’s hard to trust a study that just presents a graph with a few overly strong speculative claims as to its meaning. This happens more than it ought to.

How does “measurement” of sentiment in a document differ from “measurement” of the semantics of terms in that document?

Have we traded “access” to large numbers of documents (think about the usual Internet search engine) for validated collections? By validated collections I mean the discipline-based indexes where the user did not have to weed out completely irrelevant results.

1 Comment

  1. […] background-position: 50% 0px ; background-color:#222222; background-repeat : no-repeat; } tm.durusau.net (via @patrickDurusau) – Today, 3:04 […]

    Pingback by Be careful with dictionary-based text analysis « Another Word For It | Social Media Analytics and US Politics | Scoop.it — April 28, 2012 @ 2:04 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress