Text Analysis with LingPipe 4. Draft 0.2
Draft 0.2 is up to 363 pages.
Chapters:
- Getting Started
- Characters and Strings
- Regular Expressions
- Input and Output
- Handlers, Parsers, and Corpora
- Classifiers and Evaluation
- Naive Bayes Classifiers (not done)
- Tokenization
- Symbol Tables
- Sentence Boundary Detection (not done)
- Latent Dirichlet Allocation
- Singular Value Decomposition (not done)
Extensive annexes.
Projected to see another 1,000 or so pages. So the (not done) chapters will appear along with additional material in other chapters.
Readers welcome!
Christmas came early this year!
Questions:
- Class presentation demonstrating use of one of the techniques on library related data set.
- Compare and contrast two of the techniques on a library related data set. (Project)
- Annotated and updated bibliography for any chapter.
Update: Same questions as before but look at the updated version of the book (split into text processing and NLP as separate parts): LingPipe and Text Processing Books.