Statistical machine learning for text classification with scikit-learn and NLTK by Olivier Grisel. (PyCon 2011)
The goal of this talk is to give a state-of-the-art overview of machine learning algorithms applied to text classification tasks ranging from language and topic detection in tweets and web pages to sentiment analysis in consumer products reviews.
First third is a review of basic NLP. Review of basic functions of scikit-learn. Same for NLTK. Also covers, briefly, the Google Prediction API.
Compares all three on the movie review database. Discusses analysis of newsgroups (for topics) and identifying language of webpages.
I would not say “state-of-the-art” as much as “an intro to text classification and its potential.”