Co-Words « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 18, 2010

Data trails reconstruction at the community level in the Web of data – Presentation

Filed under: Co-Words,Data Mining,Subject Identity — Patrick Durusau @ 9:30 am

David Chavalarias: Video from SOKS: Self-Organising Knowledge Systems, Amsterdam, 29 April 2010.

Abstract:

Socio-semantic networks continuously produce data over the Web in a time consistent manner. From scientific communities publishing new findings in archives to citizens confronting their opinions in blogs, there is a real challenge to reconstruct, at the community level, the data trails they produce in order to have a global representation of the topics unfolding in these public arena. We will present such methods of reconstruction in the framework of co-word analysis, highlighting perspectives for the development of innovative tools for our daily interactions with their productions.

I wasn’t able to get very good sound quality for this presentation and there were no slides. However, I was interested enough to find the author’s home page: David Chavalarias and a wealth of interesting material.

I will be watching his projects for some very interesting results and suggest that you do the same.

Comments Off

December 4, 2010

Python Text Processing with NLTK Cookbook

Filed under: Clustering,Co-Words,Corpus Linguistics,Data Mining,Full-Text Search,Linguistic Metadata,Natural Language Processing,Text Analytics — Patrick Durusau @ 7:01 pm

Python Text Processing with NLTK Cookbook by Jacob Perkins.

Contents:

Chapter 1: Tokenizing Text and WordNet Basics

Chapter 2: Replacing and Correcting Words

Chapter 3: Creating Custom Corpora

Chapter 4: Part-of-Speech Tagging

Chapter 5: Extracting Chunks

Chapter 6: Transforming Chunks and Trees

Chapter 7: Text Classification

Chapter 8: Distributed Processing and Handling Large Datasets

Chapter 9: Parsing Specific Data

Appendix: Penn Treebank Part-of-Speech Tags

Index

A sample chapter, Chapter 3: Creating Custom Corpora is available for downloading.

Please post a link to your review of this work.

Even better, send me a copy and I will post a review. (I’m listed on Amazon.)

Comments (4)

Probabilistic User Modeling in the Presence of Drifting Concepts

Filed under: Clustering,Co-Words,Concept Detection,Concept Drift,Context,Hidden Markov Model,Neighborhood — Patrick Durusau @ 1:29 pm

Probabilistic User Modeling in the Presence of Drifting Concepts Authors(s): Vikas Bhardwaj, Ramaswamy Devarajan

Abstract:

We investigate supervised prediction tasks which involve multiple agents over time, in the presence of drifting concepts. The motivation behind choosing the topic is that such tasks arise in many domains which require predicting human actions. An example of such a task is recommender systems, where it is required to predict the future ratings, given features describing items and context along with the previous ratings assigned by the users. In such a system, the relationships among the features and the class values can vary over time. A common challenge to learners in such a setting is that this variation can occur both across time for a given agent, and also across different agents, (i.e. each agent behaves differently). Furthermore, the factors causing this variation are often hidden. We explore probabilistic models suitable for this setting, along with efficient algorithms to learn the model structure. Our experiments use the Netflix Prize dataset, a real world dataset which shows the presence of time variant concepts. The results show that the approaches we describe are more accurate than alternative approaches, especially when there is a large variation among agents. All the data and source code would be made open-source under the GNU GPL.

Interesting because not only do concepts drift from user to user but modeling users as existing in neighborhoods of other users was more accurate than purely homogeneous or heterogeneous models.

Questions:

If there is a “neighborhood” effect on users, what, if anything does that imply for co-occurrence of terms? (3-5 pages, no citations)
How would you determine “neighborhood” boundaries for terms? (3-5 pages, citations)
Do “neighborhoods” for terms vary by semantic domains? (3-5 pages, citations)

*****
Be aware that the Netflix dataset is no longer available. Possibly in response to privacy concerns. A demonstration of the utility of such concerns and their advocates.

Comments Off

December 2, 2010

Building Concept Structures/Concept Trails

Filed under: Authoring Topic Maps,Co-Words,Collocation,Corpus Linguistics,Mapping,Semantic Diversity,Semantics — Patrick Durusau @ 8:48 pm

Automatically Building Concept Structures and Displaying Concept Trails for the Use in Brainstorming Sessions and Content Management Systems Authors: Christian Biemann, Karsten Böhm, Gerhard Heyer and Ronny Melz

Abstract:

The automated creation and the visualization of concept structures become more important as the number of relevant information continues to grow dramatically. Especially information and knowledge intensive tasks are relying heavily on accessing the relevant information or knowledge at the right time. Moreover the capturing of relevant facts and good ideas should be focused on as early as possible in the knowledge creation process.

In this paper we introduce a technology to support knowledge structuring processes already at the time of their creation by building up concept structures in real time. Our focus was set on the design of a minimal invasive system, which ideally requires no human interaction and thus gives the maximum freedom to the participants of a knowledge creation or exchange processes. The initial prototype concentrates on the capturing of spoken language to support meetings of human experts, but can be easily adapted for the use in Internet communities that have to rely on knowledge exchange using electronic communication channel.

I don’t share the author’s confidence that corpus linguistics are going to provide the level of accuracy expected.

But, I find the notion of a dynamic semantic map that grows, changes and evolves during a discussion to be intriguing.

This article was published in 2006 so I will follow up to see what later results have been reported.

Comments Off

November 21, 2010

Measuring the meaning of words in contexts:…

Filed under: Ambiguity,Co-Words,Collocation,Diaphors,Metaphors,Semantics — Patrick Durusau @ 11:30 am

Measuring the meaning of words in contexts: An automated analysis of controversies about ‘Monarch butterflies,’ ‘Frankenfoods,’ and ‘stem cells’ Author(s): Loet Leydesdorff and Iina Hellsten Keywords: co-words, metaphors, diaphors, context, meaning

Abstract:

Co-words have been considered as carriers of meaning across different domains in studies of science, technology, and society. Words and co-words, however, obtain meaning in sentences, and sentences obtain meaning in their contexts of use. At the science/society interface, words can be expected to have different meanings: the codes of communication that provide meaning to words differ on the varying sides of the interface. Furthermore, meanings and interfaces may change over time. Given this structuring of meaning across interfaces and over time, we distinguish between metaphors and diaphors as reflexive mechanisms that facilitate the translation between contexts. Our empirical focus is on three recent scientific controversies: Monarch butterflies, Frankenfoods, and stem-cell therapies. This study explores new avenues that relate the study of co-word analysis in context with the sociological quest for the analysis and processing of meaning.

Excellent article on shifts of word meaning over time. Reports sufficient detail on methodology that interested readers will be able to duplicate or extend the research reported here.

Questions:

Annotated bibliography of research citing this paper.
Design a study of the shifting meaning of a 2 or 3 terms. What texts would you select? (3-5 pages, with citations)
Perform a study of shifting meaning of terms in library science. (Project)

Comments Off