Applied Natural Language Processing

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 20, 2013

Applied Natural Language Processing

Filed under: Natural Language Processing,Scala — Patrick Durusau @ 5:53 am

Applied Natural Language Processing by Jason Baldridge.

Description:

This class will provide instruction on applying algorithms in natural language processing and machine learning for experimentation and for real world tasks, including clustering, classification, part-of-speech tagging, named entity recognition, topic modeling, and more. The approach will be practical and hands-on: for example, students will program common classifiers from the ground up, use existing toolkits such as OpenNLP, Chalk, StanfordNLP, Mallet, and Breeze, construct NLP pipelines with UIMA, and get some initial experience with distributed computation with Hadoop and Spark. Guidance will also be given on software engineering, including build tools, git, and testing. It is assumed that students are already familiar with machine learning and/or computational linguistics and that they already are competent programmers. The programming language used in the course will be Scala; no explicit instruction will be given in Scala programming, but resources and assistance will be provided for those new to the language.

From the syllabus:

The foremost goal of this course is to provide practical exposure to the core techniques and applications of natural language processing. By the end, students will understand the motivations for and capabilities of several core natural language processing and machine learning algorithms and techniques used in text analysis, including:

regular expressions

vector space models

clustering

classification

deduplication

n-gram language models

topic models

part-of-speech tagging

named entity recognition

PageRank

label propagation

dependency parsing

We will show, on a few chosen topics, how natural language processing builds on and uses the fundamental data structures and algorithms presented in this course. In particular, we will discuss:

authorship attribution

language identification

spam detection

sentiment analysis

influence

information extraction

geolocation

Students will learn to write non-trivial programs for natural language processing that take advantage of existing open source toolkits. The course will involve significant guidance and instruction in to software engineering practices and principles, including:

functional programming

distributed version control systems (git)

build systems

unit testing

distributed computing (Hadoop)

The course will help prepare students both for jobs in the industry and for doing original research that involves natural language processing.

A great start to one aspect of being a “data scientist.”

I encountered this course via the Nak (Scala library for NLP) project. Version 1.1.1 was just released and I saw a tweet from Jason Baldridge on the same.

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 20, 2013

Applied Natural Language Processing

No Comments