A first failed attempt at Natural Language Processing by Mark Needham
From the post:
One of the things I find fascinating about dating websites is that the profiles of people are almost identical so I thought it would be an interesting exercise to grab some of the free text that people write about themselves and prove the similarity.
I’d been talking to Matt Biddulph about some Natural Language Processing (NLP) stuff he’d been working on and he wrote up a bunch of libraries, articles and books that he’d found useful.
I started out by plugging the text into one of the many NLP libraries that Matt listed with the vague idea that it would come back with something useful.
I’m not sure exactly what I was expecting the result to be but after 5/6 hours of playing around with different libraries I’d got nowhere and parked the problem not really knowing where I’d gone wrong.
Last week I came across a paper titled “That’s What She Said: Double Entendre Identification” whose authors wanted to work out when a sentence could legitimately be followed by the phrase “that’s what she said”.
While the subject matter is a bit risque I found that reading about the way the authors went about solving their problem was very interesting and it allowed me to see some mistakes I’d made.
Vague problem statement
Unfortunately I didn’t do a good job of working out exactly what problem I wanted to solve – my problem statement was too general.
Question: How do you teach people how to create useful problem statements?
Pointers, suggestions?