Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 24, 2013

Improved Part-of-Speech Tagging… [Boiling the Ocean?]

Filed under: Cybersecurity,Security — Patrick Durusau @ 4:18 pm

Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters by Olutobi Owoputi, Brendan O’Connor, Chris Dyer, Kevin Gimpely, Nathan Schneider and Noah A. Smith.

Abstract:

We consider the problem of part-of-speech tagging for informal, online conversational text. We systematically evaluate the use of large-scale unsupervised word clustering and new lexical features to improve tagging accuracy. With these features, our system achieves state-of-the-art tagging results on both Twitter and IRC POS tagging tasks; Twitter tagging is improved from 90% to 93% accuracy (more than 3% absolute). Qualitative analysis of these word clusters yields insights about NLP and linguistic phenomena in this genre. Additionally, we contribute the first POS annotation guidelines for such text and release a new dataset of English language tweets annotated using these guidelines. Tagging software, annotation guidelines, and large-scale word clusters are available at: http://www.ark.cs.cmu.edu/TweetNLP This paper describes release 0.3 of the “CMU Twitter Part-of-Speech Tagger” and annotated data.

This is great work but if I am interested in tweets from a particular set of users who share a common vocabulary, isn’t this like boiling the ocean?

That is if I have a defined source of data, I no longer have to guess or model what might have been meant.

TweetNLP would be very useful in such a case but not as a direct means of analysis.

TweetNLP could derive the norms or patterns found in tweets so that a constructed language for communicating via tweets would fit within those norms.

Another aspect of hiding in a data stream.

Remains a “boiling the ocean” exercise, but for those who want to distinguish ordinary tweets from those that only look like ordinary tweets.

I first saw this in a tweet by Brendan O’Connor.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress