How to get superior text processing in Python with Pynini by Kyle Gorman and Richard Sproat.
From the post:
It’s hard to beat regular expressions for basic string processing. But for many problems, including some deceptively simple ones, we can get better performance with finite-state transducers (or FSTs). FSTs are simply state machines which, as the name suggests, have a finite number of states. But before we talk about all the things you can do with FSTs, from fast text annotation—with none of the catastrophic worst-case behavior of regular expressions—to simple natural language generation, or even speech recognition, let’s explore what a state machine is, what they have to do with regular expressions.
…
Reporters, researchers and others will face a 2017 where the rate of information has increased, along with noise from media spasms over the latest taut from president-elect Trump.
Robust text mining/filtering will your daily necessities, if they aren’t already.
Tagging text is the first example. Think about auto-generating graphs from emails with “to:,” “from:,” “date:,” and key terms in the email. Tagging the key terms is essential to that process.
Once tagged, you can slice and dice the text as more information is uncovered.
Interested?