collocations in wikipedia, part 1
From the post:
collocations are combinations of terms that occur together more frequently than you’d expect by chance.
they can include
- proper noun phrases like ‘Darth Vader’
- stock/colloquial phrases like ‘flora and fauna’ or ‘old as the hills’
- common adjectives/noun pairs (notice how ‘strong coffee’ sounds ok but ‘powerful coffee’ doesn’t?)
let’s go through a couple of techniques for finding collocations taken from the exceptional nlp text “foundations of statistical natural language processing” by manning and schutze.
Looks like the start of a very interesting series on collocation (statistical) in Wikipedia. Which is a serious data set for training purposes.
BTW, don’t miss the homepage. Lots of interesting resources.
Update: 18 November 2011
See also:
collocations in wikipedia, part 2
finding phrases with mutual information [collocations, part 3]
I am making a separate blog post on parts 2 and 3 but just in case you come here first…. Enjoy!
[…] collocations in wikipedia, part 1 for our coverage of the first […]
Pingback by collocations in wikipedia – parts 2 and 3 « Another Word For It — November 18, 2011 @ 9:37 pm