Archive for the ‘co-occurrence’ Category

Onomastics 2.0 – The Power of Social Co-Occurrences

Monday, March 11th, 2013

Onomastics 2.0 – The Power of Social Co-Occurrences by Folke Mitzlaff, Gerd Stumme.


Onomastics is “the science or study of the origin and forms of proper names of persons or places.” [“Onomastics”., 2013. this http URL (11 February 2013)]. Especially personal names play an important role in daily life, as all over the world future parents are facing the task of finding a suitable given name for their child. This choice is influenced by different factors, such as the social context, language, cultural background and, in particular, personal taste.

With the rise of the Social Web and its applications, users more and more interact digitally and participate in the creation of heterogeneous, distributed, collaborative data collections. These sources of data also reflect current and new naming trends as well as new emerging interrelations among names.

The present work shows, how basic approaches from the field of social network analysis and information retrieval can be applied for discovering relations among names, thus extending Onomastics by data mining techniques. The considered approach starts with building co-occurrence graphs relative to data from the Social Web, respectively for given names and city names. As a main result, correlations between semantically grounded similarities among names (e.g., geographical distance for city names) and structural graph based similarities are observed.

The discovered relations among given names are the foundation of “nameling” [this http URL], a search engine and academic research platform for given names which attracted more than 30,000 users within four months, underpinningthe relevance of the proposed methodology.

Interesting work on the co-occurrence of names.

Chosen names in this case but I wonder if the same would be true for false names?

Are there patterns to false names chosen by actors who are attempting to conceal their identities?

I first saw this in a tweet by Stefano Bertolo.

Extract meta concepts through co-occurrences analysis and graph theory

Saturday, January 14th, 2012

Extract meta concepts through co-occurrences analysis and graph theory

Cristian Mesiano writes:

During The Christmas period I had finally the chance to read some papers about probabilistic latent semantic and its applications in auto classification and indexing.

The main concept behind “latent semantic” lays on the assumption that words that occurs close in the text are related to the same semantic construct.

Based on this principle the LSA (and partially also the PLSA ) builds a matrix to keep track of the co-occurrences of the words in text, and it assign a score to these co-occurrences considering the distribution in the corpus as well.

Often TF-IDF score is used to rank the words.

Anyway, I was wondering if this techniques could be useful also to extract key concepts from the text.

Basically I thought: “in LSA we consider some statistics over the co-occurrences, so: why not consider the link among the co-occurrences as well?”.

Using the first three chapters of “The Media in the Network Society, author: Gustavo Cardoso,” Christian creates a series of graphs.

Christian promises his opinion on classification of texts using this approach.

In the meantime, what’s yours?