Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 15, 2011

Statistical machine learning for text classification

Filed under: Natural Language Processing,NLTK,Python — Patrick Durusau @ 7:51 pm

Statistical machine learning for text classification with scikit-learn and NLTK by Olivier Grisel. (PyCon 2011)

The goal of this talk is to give a state-of-the-art overview of machine learning algorithms applied to text classification tasks ranging from language and topic detection in tweets and web pages to sentiment analysis in consumer products reviews.

First third is a review of basic NLP. Review of basic functions of scikit-learn. Same for NLTK. Also covers, briefly, the Google Prediction API.

Compares all three on the movie review database. Discusses analysis of newsgroups (for topics) and identifying language of webpages.

I would not say “state-of-the-art” as much as “an intro to text classification and its potential.”

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress