Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 6, 2018

Basic Text [Leaked Email] Processing in R

Filed under: R,Text Mining — Patrick Durusau @ 10:08 am

Basic Text Processing in R by Taylor Arnold and Lauren Tilton.

From Learning Goals:

A substantial amount of historical data is now available in the form of raw, digitized text. Common examples include letters, newspaper articles, personal notes, diary entries, legal documents and transcribed speeches. While some stand-alone software applications provide tools for analyzing text data, a programming language offers increased flexibility to analyze a corpus of text documents. In this tutorial we guide users through the basics of text analysis within the R programming language. The approach we take involves only using a tokenizer that parses text into elements such as words, phrases and sentences. By the end of the lesson users will be able to:

  • employ exploratory analyses to check for errors and detect high-level patterns;
  • apply basic stylometric methods over time and across authors;
  • approach document summarization to provide a high-level description of the
    elements in a corpus.

The tutorial uses United States Presidential State of the Union Addresses, yawn, as their dataset.

Great tutorial but aren’t there more interesting datasets to use as examples?

Modulo that I haven’t prepared such a dataset or matched it to a tutorial such as this one.

Question: What would make a more interesting dataset than United States Presidential State of the Union Addresses?

Anything is not a helpful answer.

Suggestions?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress