Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 10, 2012

Explore Python, machine learning, and the NLTK library

Filed under: Machine Learning,NLTK,Python — Patrick Durusau @ 4:18 pm

Explore Python, machine learning, and the NLTK library by Chris Joakim (cjoakim@bellsouth.net), Senior Software Engineer, Primedia Inc.

From the post:

The challenge: Use machine learning to categorize RSS feeds

I was recently given the assignment to create an RSS feed categorization subsystem for a client. The goal was to read dozens or even hundreds of RSS feeds and automatically categorize their many articles into one of dozens of predefined subject areas. The content, navigation, and search functionality of the client website would be driven by the results of this daily automated feed retrieval and categorization.

The client suggested using machine learning, perhaps with Apache Mahout and Hadoop, as she had recently read articles about those technologies. Her development team and ours, however, are fluent in Ruby rather than Java™ technology. This article describes the technical journey, learning process, and ultimate implementation of a solution.

If a wholly automated publication process leaves you feeling uneasy, imagine the same system that feeds content to subject matter experts for further processing.

Think of it as processing raw ore on the way to finding diamonds and then deciding which ones get polished.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress