Getting started with text analytics by Chris DuBois.
At GraphLab, we are helping data scientists go from inspiration to production. As part of that goal, we made sure that GraphLab Create is useful for manipulating text data, plugging the results into a machine learning model, and deploying a predictive service.
Text data is useful in a wide variety of applications:
- Finding key phrases in online reviews that describe an attribute or aspect of a restaurant, product for sale, etc.
- Detecting sentiment in social media, such as tweets and news article comments.
- Predicting influential documents in large corpora, such as PubMed abstracts and arXiv articles
So how do data scientists get started with text data? Regardless of the ultimate goal, the first step in text processing is typically feature engineering. We make this work easy to do using GraphLab Create. Examples of features include:
…
Just in case you get tired of watching conference presentations this weekend, I found this post from early December 2014 that I have been meaning to mention. Take a break from the videos and enjoy working through this post.
Chris promises more posts on data science skills so stay tuned!