Archive for the ‘MALLET’ Category

Lincoln Logarithms: Finding Meaning in Sermons

Thursday, February 28th, 2013

Lincoln Logarithms: Finding Meaning in Sermons

From the webpage:

Just after his death, Abraham Lincoln was hailed as a luminary, martyr, and divine messenger. We wondered if using digital tools to analyze a digitized collection of elegiac sermons might uncover patterns or new insights about his memorialization.

We explored the power and possibility of four digital tools—MALLET, Voyant, Paper Machines, and Viewshare. MALLET, Paper Machines, and Voyant all examine text. They show how words are arranged in texts, their frequency, and their proximity. Voyant and Paper Machines also allow users to make visualizations of word patterns. Viewshare allows users to create timelines, maps, and charts of bodies of material. In this project, we wanted to experiment with understanding what these tools, which are in part created to reveal, could and could not show us in a small, but rich corpus. What we have produced is an exploration of the possibilities and the constraints of these tools as applied to this collection.

The resulting digital collection: The Martyred President: Sermons Given on the Assassination of President Lincoln.

Let’s say this is not an “ahistorical” view. ;-)

Good example of exploring “unstructured” data.

A first step before authoring a topic map.

Topic Discovery With Apache Pig and Mallet

Friday, February 1st, 2013

Topic Discovery With Apache Pig and Mallet

Only one of two posts from this blog in 2012 but it is a useful one.

From the post:

A common desire when working with natural language is topic discovery. That is, given a set of documents (eg. tweets, blog posts, emails) you would like to discover the topics inherent in those documents. Often this method is used to summarize a large corpus of text so it can be quickly understood what that text is ‘about’. You can go further and use topic discovery as a way to classify new documents or to group and organize the documents you’ve done topic discovery on.

Walks through the use of Pig and Mallet on a newsgroup data set.

I have been thinking about getting one of those unlimited download newsgroup accounts.

Maybe I need to go ahead and start building some newsgroup data sets.

Getting Started with MALLET and Topic Modeling

Thursday, September 1st, 2011

Getting Started with MALLET and Topic Modeling

If you don’t remember MALLET, take a look at: MALLET: MAchine Learning for LanguagE Toolkit Topic Map Competition (TMC) Contender?

Shawn is very interested in applying topic modeling to a variety of historical texts.

His blog, Electric Archaeology: Digital Media for Learning and Research looks very interesting. Covers: “Agent based modeling, games, virtual worlds, and online education for archaeology and history.”

This is the sort of person who might be interested in topic maps and related technologies.

As far as I know, there is still a real lack of example driven texts that would introduce most humanists to modern software.

PyBrain: The Python Machine Learning Library

Thursday, February 3rd, 2011

PyBrain: The Python Machine Learning Library

From the website:

PyBrain is a modular Machine Learning Library for Python. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms.

PyBrain is short for Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library. In fact, we came up with the name first and later reverse-engineered this quite descriptive “Backronym”.

How is PyBrain different?

While there are a few machine learning libraries out there, PyBrain aims to be a very easy-to-use modular library that can be used by entry-level students but still offers the flexibility and algorithms for state-of-the-art research. We are constantly working on more and faster algorithms, developing new environments and improving usability.

What PyBrain can do

PyBrain, as its written-out name already suggests, contains algorithms for neural networks, for reinforcement learning (and the combination of the two), for unsupervised learning, and evolution. Since most of the current problems deal with continuous state and action spaces, function approximators (like neural networks) must be used to cope with the large dimensionality. Our library is built around neural networks in the kernel and all of the training methods accept a neural network as the to-be-trained instance. This makes PyBrain a powerful tool for real-life tasks.

Another tool kit to assist in the construction of topic maps.

And another likely contender for the Topic Map Competition!