Extract meta concepts through co-occurrences analysis and graph theory
Cristian Mesiano writes:
During The Christmas period I had finally the chance to read some papers about probabilistic latent semantic and its applications in auto classification and indexing.
The main concept behind “latent semantic” lays on the assumption that words that occurs close in the text are related to the same semantic construct.
Based on this principle the LSA (and partially also the PLSA ) builds a matrix to keep track of the co-occurrences of the words in text, and it assign a score to these co-occurrences considering the distribution in the corpus as well.
Often TF-IDF score is used to rank the words.
Anyway, I was wondering if this techniques could be useful also to extract key concepts from the text.
Basically I thought: “in LSA we consider some statistics over the co-occurrences, so: why not consider the link among the co-occurrences as well?”.
Using the first three chapters of “The Media in the Network Society, author: Gustavo Cardoso,” Christian creates a series of graphs.
Christian promises his opinion on classification of texts using this approach.
In the meantime, what’s yours?