Mavuno: A Hadoop-Based Text Mining Toolkit
From the webpage:
Mavuno is an open source, modular, scalable text mining toolkit built upon Hadoop. It supports basic natural language processing tasks (e.g., part of speech tagging, chunking, parsing, named entity recognition), is capable of large-scale distributional similarity computations (e.g., synonym, paraphrase, and lexical variant mining), and has information extraction capabilities (e.g., instance and semantic relation mining). It can easily be adapted to new input formats and text mining tasks.
Just glancing at the documentation I am intrigued by the support for Java regular expressions. More on that this coming week.
I first saw this at myNoSQL.