Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 25, 2011

collocations in wikipedia, part 1

Filed under: Collocation,Natural Language Processing — Patrick Durusau @ 7:34 pm

collocations in wikipedia, part 1

From the post:

collocations are combinations of terms that occur together more frequently than you’d expect by chance.

they can include

  • proper noun phrases like ‘Darth Vader’
  • stock/colloquial phrases like ‘flora and fauna’ or ‘old as the hills’
  • common adjectives/noun pairs (notice how ‘strong coffee’ sounds ok but ‘powerful coffee’ doesn’t?)

let’s go through a couple of techniques for finding collocations taken from the exceptional nlp text “foundations of statistical natural language processing” by manning and schutze.

Looks like the start of a very interesting series on collocation (statistical) in Wikipedia. Which is a serious data set for training purposes.

BTW, don’t miss the homepage. Lots of interesting resources.


Update: 18 November 2011

See also:

collocations in wikipedia, part 2

finding phrases with mutual information [collocations, part 3]

I am making a separate blog post on parts 2 and 3 but just in case you come here first…. Enjoy!

1 Comment

  1. […] collocations in wikipedia, part 1 for our coverage of the first […]

    Pingback by collocations in wikipedia – parts 2 and 3 « Another Word For It — November 18, 2011 @ 9:37 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress