Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 6, 2010

The AQ Methods for Concept Drift

Filed under: Authoring Topic Maps,Classification,Concept Drift,Topic Maps — Patrick Durusau @ 4:51 am

The AQ Methods for Concept Drift Authors: Marcus A. Maloof Keywords:online learning, concept drift, aq algorithm, ensemble methods

Abstract:

Since the mid-1990’s, we have developed, implemented, and evaluated a number of learning methods that cope with concept drift. Drift occurs when the target concept that a learner must acquire changes over time. It is present in applications involving user preferences (e.g., calendar scheduling) and adversaries (e.g., spam detection). We based early efforts on Michalski’s aq algorithm, and our more recent work has investigated ensemble methods. We have also implemented several methods that other researchers have proposed. In this chapter, we survey results that we have obtained since the mid-1990’s using the Stagger concepts and learning methods for concept drift. We examine our methods based on the aq algorithm, our ensemble methods, and the methods of other researchers. Dynamic weighted majority with an incremental algorithm for producing decision trees as the base learner achieved the best overall performance on this problem with an area under the performance curve after the first drift point of .882. Systems based on the aq11 algorithm, which incrementally induces rules, performed comparably, achieving areas of .875. Indeed, an aq11 system with partial instance memory and Widmer and Kubat’s window adjustment heuristic achieved the best performance with an overall area under the performance curve, with an area of .898.

The author offers this definition of concept drift:

Concept drift [19, 30] is a phenomenon in which examples have legitimate labels at one time and have different legitimate labels at another time. Geometrically, if we view a target concept as a cloud of points in a feature space, concept drift may entail the cloud changing its position, shape, and size. From the perspective of Bayesian decision theory, these transformations equate to changes to the form or parameters of the prior and class-conditional distributions.

Hmmm, “legitimate labels,” sounds like a job for topic maps doesn’t it?

Questions:

  1. Has concept drift been used in library classification? (research question)
  2. How would you use concept drift concepts in library classification? (3-5 pages, no citations)
  3. Demonstrate use of concept drift techniques to augment topic map authoring. (project)

On Classifying Drifting Concepts in P2P Networks

Filed under: Ambiguity,Authoring Topic Maps,Classification,Concept Drift — Patrick Durusau @ 4:07 am

On Classifying Drifting Concepts in P2P Networks Authors: Hock Hee Ang, Vivekanand Gopalkrishnan, Wee Keong Ng and Steven Hoi Keywords: Concept drift, classification, peer-to-peer (P2P) networks, distributed classification

Abstract:

Concept drift is a common challenge for many real-world data mining and knowledge discovery applications. Most of the existing studies for concept drift are based on centralized settings, and are often hard to adapt in a distributed computing environment. In this paper, we investigate a new research problem, P2P concept drift detection, which aims to effectively classify drifting concepts in P2P networks. We propose a novel P2P learning framework for concept drift classification, which includes both reactive and proactive approaches to classify the drifting concepts in a distributed manner. Our empirical study shows that the proposed technique is able to effectively detect the drifting concepts and improve the classification performance.

The authors define the problem as:

Concept drift refers to the learning problem where the target concept to be predicted, changes over time in some unforeseen behaviors. It is commonly found in many dynamic environments, such as data streams, P2P systems, etc. Real-world examples include network intrusion detection, spam detection, fraud detection, epidemiological, and climate or demographic data, etc.

The authors may well have been the first to formulate this problem among mechanical peers but any humanist could have pointed out examples concept drift between people. Both in the literature as well as real life.

Questions:

  1. What are the implications of concept drift for Linked Data? (3-5 pages, no citations)
  2. What are the implications of concept drift for static ontologies? (3-5 pages, no citations)
  3. Is concept development (over time) another form of concept drift? (3-5 pages, citations, illustrations, presentation)

*****
PS: Finding this paper is an illustration of ambiguity leading to serendipitous discovery. I searched for one of the author’s instead of the exact title of another paper. While scanning the search results I found this paper.

November 5, 2010

Ambiguity and Serendipity

Filed under: Ambiguity,Authoring Topic Maps,Topic Maps — Patrick Durusau @ 8:35 pm

There was an email discussion recently where ambiguity was discussed as something to be avoided.

It occurred to me, if there were no ambiguity, there would be no serendipity.

Think about the last time you searched for a particular paper. If you remembered enough to go directly to it, you did not see any similar or closely resembling papers along the way.

Now imagine every information request you make results in exactly what you were searching for.

What a terribly dull search experience that would be!

Topic maps can produce the circumstances where serendipity occurs because a subject can be identified any number of ways. Quite possibly several that you are unaware of. And seeing those other ways may spark a memory of another paper, perhaps another line of thought, etc.

I think my list of “other names” for record linkage now exceeds 25 and I really need to cast those into a topic map fragment along with citations to the places they can be found.

I don’t think of topic maps as a means to avoid ambiguity but rather as a means to make ambiguity a manageable part of an information seeking experience.

TMDM-NG – Overloading Occurrence

Filed under: Authoring Topic Maps,TMDM,Topic Maps,XTM — Patrick Durusau @ 4:23 pm

“Occurrence” in topic maps is currently overloaded. Seriously overloaded.

In one sense, “occurrence” is used as it is in a bibliographic reference. That is that subject X “occurs” at volume Y, page Z. A reader expects to find the subject in question at that location.

In the overloaded sense, “occurrence” is used to mean some additional property of a subject.

To me the semantics of “occurrence” weigh against using it for any property associated with a subject.

That has been the definition used in topic maps for a very long time but that to me simply ripens it for correction.

Occurrence should be used only for instances of a subject that are located outside of a topic map.

A property element should be allowed for any topic, name, occurrence or association. Every property should have a type attribute.

It is a property of the subject represented by the construct where it appears.

Previously authored topic maps will continue to be valid since as of yet there are no processors that could validate the use of “occurrence” either in the new or old sense of the term.

Older topic map software will not be able to process newer topic maps but unless topic maps change and evolve (even COBOL has), they will die.

November 4, 2010

A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

Filed under: Authoring Topic Maps,Clustering,Data Mining — Patrick Durusau @ 11:26 am

A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise (1996) Authors: Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu Keywords: Clustering Algorithms, Arbitrary Shape of Clusters, Efficiency on Large Spatial Databases, Handling Noise.

Before you decide to skip this paper as “old” consider that it has > 600 citations in CiteSeer.

Abstract:

Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLAR-ANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.

Discovery of classes is always an issue in topic map authoring/design and clustering is one way to find classes, perhaps even ones you did not suspect existed.

Subject Identification Patterns

Filed under: Authoring Topic Maps,Subject Identifiers,Subject Identity,Subject Locators — Patrick Durusau @ 10:27 am

Does that sound like a good book title?

Thinking that since everyone is recycling old stuff under the patterns rubric that topic maps may as well jump on the bandwagon.

Instead of the three amigos (was that a movie?) we could have the dirty dozen honchos (or was that another movie?). I don’t get out much these days so I would probably need some help with current cultural references.

This ties into Lars Heuer’s effort to distinguish between Playboy Playmates and Astronauts, while trying to figure out why birds keep, well, let’s just say he has to wash his hair a lot.

When you have an entry from DBpedia, what do you have to know to identify it? Its URI is one thing but I rarely encounter URIs while shopping. (Or playmates for that matter.)

November 2, 2010

Afghanistan War Diary – Improvements

Filed under: Authoring Topic Maps,Maiana,Topic Map Software,Topic Maps — Patrick Durusau @ 5:25 am

It was only days after the release of the Afghanistan War Diary that Aki Kivela posted improvements to it using automatic extractors.

Important not only as a demonstration of participation in a topic maps project but also the incremental nature of topic map authoring.

Afghanistan War Diary

Filed under: Authoring Topic Maps,Data Source,Maiana,Topic Maps — Patrick Durusau @ 5:15 am

Afghanistan War Diary.

A portion of the Afghanistan war documents published by Wikileaks as a topic map.

The release is an automatic conversion to a topic map so does not reflect the nuances that human authoring brings to a topic map.

QuaaxTM – PHP Topic Maps – New Release

Filed under: Authoring Topic Maps,Topic Map Software — Patrick Durusau @ 5:00 am

QuaaxTM – PHP Topic Maps has a new release!

Added support for XTM 2.1 read/write.

October 31, 2010

7. “We always know more than we can say, and we will always say more than we can write down.”

Filed under: Authoring Topic Maps,Knowledge Management,Marketing,Topic Maps — Patrick Durusau @ 7:59 pm

Knowledge Management Principle Seven of Seven (Rendering Knowledge by David Snowden)

We always know more than we can say, and we will always say more than we can write down. This is probably the most important. The process of taking things from our heads, to our mouths (speaking it) to our hands (writing it down) involves loss of content and context. It is always less than it could have been as it is increasingly codified.

Authoring a topic map always involves loss of content and context.

The same loss of content and context has bedeviled the AI community for the last 50 years.

No one can control the loss content and context or even identify it ahead of time.

Testing topic maps on users will help bring them closer to user expectations.

October 30, 2010

6. “The way we know things is not the way we report we know things.”

Filed under: Authoring Topic Maps,Knowledge Management,Marketing,Topic Maps — Patrick Durusau @ 12:14 pm

Knowledge Management Principle Six of Seven (Rendering Knowledge by David Snowden)

The way we know things is not the way we report we know things. There is an increasing body of research data which indicates that in the practice of knowledge people use heuristics, past pattern matching and extrapolation to make decisions, coupled with complex blending of ideas and experiences that takes place in nanoseconds. Asked to describe how they made a decision after the event they will tend to provide a more structured process oriented approach which does not match reality. This has major consequences for knowledge management practice.

It wasn’t planned but appropriate this should follow Harry Halpin’s Sense and Reference on the Web.

Questions:

  1. Find three examples of decision making that differs from the actual process.
  2. Of the examples reported in class, would any of them impact your design of a topic map? (3-5 pages, no citations)
  3. Of the same examples, would any of them impact your design of a topic map interface? (3-5 pages, no citations)
  4. Do you consider a topic map and its interface to be different? If so, how? If not, why not? (3-5 pages, no citations)

October 29, 2010

5. “Tolerated failure imprints learning better than success.”

Filed under: Authoring Topic Maps,Knowledge Management,Marketing,Topic Maps — Patrick Durusau @ 7:24 am

Knowledge Management Principle Five of Seven (Rendering Knowledge by David Snowden)

Tolerated failure imprints learning better than success. When my young son burnt his finger on a match he learnt more about the dangers of fire than any amount of parental instruction could provide. All human cultures have developed forms that allow stories of failure to spread without attribution of blame. Avoidance of failure has greater evolutionary advantage than imitation of success. It follows that attempting to impose best practice systems is flying in the face of over a hundred thousand years of evolution that says it is a bad thing.

Perhaps with fingers and matches, but I am not sure “failure imprints learning better than success” in knowledge management.

The perennial failure (as opposed to the perennial philosophy), the effort to create a “perfect” language, now using URIs, continues unabated.

The continuing failure to effectively share intelligence is another lesson slow in being learned.

Not that “best practices” would help in either case.

Should failure of “perfect” languages and sharing be principles of knowledge management?

Ordinance Survey Linked Data

Filed under: Authoring Topic Maps,Mapping,Merging,Topic Maps — Patrick Durusau @ 5:40 am

Ordinance Survey Linked Data.

Description:

Ordnance Survey is Great Britain’s national mapping agency, providing the most accurate and up-to-date geographic data, relied on by government, business and individuals. OS OpenData is the opening up of Ordnance Survey data as part of the drive to increase innovation and support the “Making Public Data Public” initiative. As part of this initiative Ordnance Survey has published a number of its products as Linked Data. Linked Data is a growing part of the Web where data is published on the Web and then linked to other published data in much the same way that web pages are interlinked using hypertext. The term Linked Data is used to describe a method of exposing, sharing, and connecting data via URIs on the Web….

Let’s use topic maps to connect subjects that don’t have URIs.

Subject mapping exercise:

  1. Connect 5 subjects from the Domseday Book
  2. Connect 5 subjects from either The Shakespeare Paper Trail: The Early Years and/or The Shakespeare Paper Trail: The Later Years
  3. Connect 5 subjects from WW2 People’s War (you could do occurrences but try for something more imaginative)
  4. Connect 5 subjects from some other period of English history.
  5. Suggest other linked data sources and sources of subjects for subject mapping (extra credit)

October 28, 2010

4. “Everything is fragmented.”

Filed under: Authoring Topic Maps,Knowledge Management,Marketing,Topic Maps — Patrick Durusau @ 6:12 am

Knowledge Management Principle Four of Seven (Rendering Knowledge by David Snowden)

Everything is fragmented. We evolved to handle unstructured fragmented fine granularity information objects, not highly structured documents. People will spend hours on the internet, or in casual conversation without any incentive or pressure. However creating and using structured documents requires considerably more effort and time. Our brains evolved to handle fragmented patterns not information.

I would rather say that complex structures exist just beyond the objects we handle in day to day conversation.

The structures are there, if and when we choose to look.

The problem Snowden has identified is that most systems can’t have structures “appear” when they “look” for them.

Either the objects fit into some structure or they don’t from the perspective of most systems.

Making those structures, that normally appear only when we look, explicit, is the issue.

Explicit or not, none of our objects have meaning in isolation from those structures.

To make it interesting, we all bring slightly different underlying structures to those objects.

(Making assumed or transparent structures explicit is hard. Witness the experience of markup.)

October 27, 2010

3. “In the context of real need few people will withhold their knowledge.”

Filed under: Authoring Topic Maps,Knowledge Management,Marketing,Topic Maps — Patrick Durusau @ 5:58 am

Knowledge Management Principle Three of Seven (Rendering Knowledge by David Snowden)

In the context of real need few people will withhold their knowledge. A genuine request for help is not often refused unless there is literally no time or a previous history of distrust. On the other hand ask people to codify all that they know in advance of a contextual enquiry and it will be refused (in practice its impossible anyway). Linking and connecting people is more important than storing their artifacts.

I guess the US intelligence community has a “previous history of distrust” and that is why some 9 years after 9/11 effective intelligence sharing remains a fantasy.

People withhold their knowledge for all sorts of reasons. Job security comes to mind. Closely related is self-importance. Followed closely by revelation of incompetence. General insecurity, and a host of others.

Technical issues did not create the need for semantic integration. Technical solutions will not, by themselves, result in semantic integration.

October 26, 2010

2. “We only know what we know when we need to know it.”

Filed under: Authoring Topic Maps,Knowledge Management,Marketing,Topic Maps — Patrick Durusau @ 7:29 am

Knowledge Management Principle Two of Seven (Rendering Knowledge by David Snowden)

We only know what we know when we need to know it. Human knowledge is deeply contextual and requires stimulus for recall. Unlike computers we do not have a list-all function. Small verbal or nonverbal clues can provide those ah-ha moments when a memory or series of memories are suddenly recalled, in context to enable us to act. When we sleep on things we are engaged in a complex organic form of knowledge recall and creation; in contrast a computer would need to be rebooted.

An important principle both for authoring and creating useful topic maps.

A topic map for repairing a jet engine could well begin by filming the repair multiple times from different angles.

Then have a mechanic describe the process they followed without reference to the video.

The differences are things that need to be explored and captured for the map.

Likewise, a map should not stick too closely to the “bare” facts needed for the map.

People using the map will need context in order to make the best use of its information.

What seems trivial or irrelevant, may be the clue that triggers an appropriate response. Test with users!

*****

PS: Don’t forget that the context in which a topic map is *used* is also part of its context.

October 18, 2010

TMDM-NG – Reification

Filed under: Authoring Topic Maps,TMDM,Topic Maps,XTM — Patrick Durusau @ 7:44 am

Reification in the TMDM means using a topic to “reify” a name, occurrence, association, etc. Whatever a subject is represented by a name, occurrence or association, after “reification” it is also also represented by a topic.

For the TMDM-NG, let’s drop reification and make names, occurrences, associations, etc., first class citizens in a topic map.

Making names, occurrences, associations first class citizens would mean we could add properties to them without the overhead of creating topics to represent subjects that already have representatives in a topic map.

Do need to work on occurrence being overloaded to mean both in the bibliographic sense as well as a property but that can wait for a future post.

October 17, 2010

The Neighborhood Auditing Tool for the UMLS and its Source Terminologies

Filed under: Authoring Topic Maps,Interface Research/Design,Mapping,Topic Maps,Usability — Patrick Durusau @ 5:19 am

The next NCBO Webinar will be presented by Dr. James Geller from the New Jersey Institute of Technology on “The Neighborhood Auditing Tool for the UMLS and its Source Terminologies” at 10:00am PDT, Wednesday, October 20.

ABSTRACT:

The UMLS’s integration of more than 100 source vocabularies makes it susceptible to errors. Furthermore, its size and complexity can make it very difficult to locate such errors. A software tool, called the Neighborhood Auditing Tool (NAT), that facilitates UMLS auditing is presented. The NAT supports “neighborhood-based” auditing, where, at any given time, an auditor concentrates on a single focus concept and one of a variety of neighborhoods of its closely related concepts. The NAT can be seen as a special browser for the complex structure of the UMLS’s hierarchies. Typical diagrammatic displays of concept networks have a number of shortcomings, so the NAT utilizes a hybrid diagram/text interface that features stylized neighborhood views which retain some of the best features of both the diagrammatic layouts and text windows while avoiding the shortcomings. The NAT allows an auditor to display knowledge from both the Metathesaurus (concept) level and the Semantic Network (semantic type) level. Various additional features of the NAT that support the auditing process are described. The usefulness of the NAT is demonstrated through a group of case studies. Its impact is tested with a study involving a select group of auditors.


WEBEX DETAILS:
Topic: NCBO Webinar Series
Date: Wednesday, October 20, 2010
Time: 10:00 am, Pacific Daylight Time (San Francisco, GMT-07:00)
Meeting Number: 929 613 752
Meeting Password: ncbomeeting

****

Deeply edited version from NCBO Webinar – James Geller, October 20 at 10:00am PT, which has numerous other details.

If you translate “integration” as “merging,” the immediate relevance to topic maps and exploration of data sets becomes immediately obvious.

October 14, 2010

Using text animated transitions to support navigation in document histories

Filed under: Authoring Topic Maps,Interface Research/Design,Software,Trails,Visualization — Patrick Durusau @ 10:34 am

Using text animated transitions to support navigation in document histories Authors: Fanny Chevalier, Pierre Dragicevic, Anastasia Bezerianos, Jean-Daniel Fekete Keywords: animated transitions, revision control, text editing

Abstract:

This article examines the benefits of using text animated transitions for navigating in the revision history of textual documents. We propose an animation technique for smoothly transitioning between different text revisions, then present the Diffamation system. Diffamation supports rapid exploration of revision histories by combining text animated transitions with simple navigation and visualization tools. We finally describe a user study showing that smooth text animation allows users to track changes in the evolution of textual documents more effectively than flipping pages.

Project website: http://www.aviz.fr/diffamation/

The video of tracking changes to a document has to be seen to be appreciated.

Research question as to how to visualize changes/revisions to a topic map. This is one starting place.

October 8, 2010

A Haptic-Based Framework for Chemistry Education: Experiencing Molecular Interactions with Touch

A Haptic-Based Framework for Chemistry Education: Experiencing Molecular Interactions with Touch Author(s): Sara Comai, Davide Mazza Keywords: Haptic technology – Chemical education and teaching – Molecular interaction

Abstract:

The science of haptics has received a great attention in the last decade for data visualization and training. In particular haptics can be introduced as a novel technology for educational purposes. The usage of haptic technologies can greatly help to make the students feel sensations not directly experienceable and typically only reported as notions, sometimes also counter-intuitively, in textbooks. In this work, we present a haptically-enhanced system for the tactile exploration of molecules. After a brief description of the architecture of the developed system, the paper describes how it has been introduced in the usual didactic activity by providing a support for the comprehension of concepts typically explained only theoretically. Users feedbacks and impressions are reported as results of this innovation in teaching.

Imagine researchers using haptics to recognize molecules or molecular reactions.

Are the instances of recognition to be compared with other such instances?

How would you establish the boundaries for a “match?”

How would you communicate those boundaries to others?

October 7, 2010

Machine Learning Support for Human Articulation of Concepts from Examples – A Learning Framework

Filed under: Authoring Topic Maps,Topic Maps — Patrick Durusau @ 6:35 am

Machine Learning Support for Human Articulation of Concepts from Examples – A Learning Framework
Author(s): Gabriela Pavel Keywords: concept learning – machine learning – visual environment – learning framework

Abstract:

We aim to show that machine learning methods can provide meaningful feedback to help the student articulate concepts from examples, in particular from images. Therefore we present here a framework to support the learning through human visual classifications and machine learning methods.

In the article the sentence:

Images help people externalize their intuitive knowledge, within a process called articulation, or transfer from tacit to explicit knowledge.

Caught my eye.

The process of authoring of topic maps is articulation, or transfer from tacit to explicit knowledge.

The paper addresses the use of images to teach concepts represented in images but also from students their tacit knowledge of the concepts represented in images.

If that seems a bit mundane, imagine intelligence authors scanning images of people or locales and adding their tacit knowledge to a shareable data store.

October 6, 2010

Mining Historic Query Trails to Label Long and Rare Search Engine Queries

Filed under: Authoring Topic Maps,Data Mining,Entity Extraction,Search Engines,Searching — Patrick Durusau @ 7:05 am

Mining Historic Query Trails to Label Long and Rare Search Engine Queries Authors: Peter Bailey, Ryen W. White, Han Liu, Giridhar Kumaran Keywords: Long queries, query labeling

Abstract:

Web search engines can perform poorly for long queries (i.e., those containing four or more terms), in part because of their high level of query specificity. The automatic assignment of labels to long queries can capture aspects of a user’s search intent that may not be apparent from the terms in the query. This affords search result matching or reranking based on queries and labels rather than the query text alone. Query labels can be derived from interaction logs generated from many users’ search result clicks or from query trails comprising the chain of URLs visited following query submission. However, since long queries are typically rare, they are difficult to label in this way because little or no historic log data exists for them. A subset of these queries may be amenable to labeling by detecting similarities between parts of a long and rare query and the queries which appear in logs. In this article, we present the comparison of four similarity algorithms for the automatic assignment of Open Directory Project category labels to long and rare queries, based solely on matching against similar satisfied query trails extracted from log data. Our findings show that although the similarity-matching algorithms we investigated have tradeoffs in terms of coverage and accuracy, one algorithm that bases similarity on a popular search result ranking function (effectively regarding potentially-similar queries as “documents”) outperforms the others. We find that it is possible to correctly predict the top label better than one in five times, even when no past query trail exactly matches the long and rare query. We show that these labels can be used to reorder top-ranked search results leading to a significant improvement in retrieval performance over baselines that do not utilize query labeling, but instead rank results using content-matching or click-through logs. The outcomes of our research have implications for search providers attempting to provide users with highly-relevant search results for long queries.

(Apologies for repeating the long abstract but this needs wider notice.)

What the authors call “label prediction algorithms,” is a step in mining data for subjects.

The research may also improve search results through the use of labels for ranking.

October 5, 2010

Re-Using Linked Data

Filed under: Authoring Topic Maps,Dataset,Linked Data,Topic Maps — Patrick Durusau @ 9:24 am

The German national library released its authority records as linked data.

News and reference services have content management systems that don’t use URIs, so how do they link up public linked data with their private data?

In a way that they can share the resulting linked data within their organization?

Exploration question: What mapping facilities exist in popular CMS systems for mapping linked data to local data?

I don’t know the answer to that but will be finding out.

In the meantime, if you know your CMS system cannot do such a mapping, consider using topic maps. (topicmaps.org)

Topic maps can create linked data that is not subject to the limitation of using URIs.

tagging, communities, vocabulary, evolution

Filed under: Authoring Topic Maps,Interface Research/Design,Tagging — Patrick Durusau @ 8:46 am

tagging, communities, vocabulary, evolution Authors: Shilad Sen, Shyong K. (Tony) Lam, Al Mamunur Rashid, Dan Cosley, Dan Frankowski, Jeremy Osterhouse, F. Maxwell Harper, John Riedl Keywords: communities, evolution, social book-marking, tagging, vocabulary

Abstract:

A tagging community’s vocabulary of tags forms the basis for social navigation and shared expression. We present a user-centric model of vocabulary evolution in tagging communities based on community influence and personal tendency. We evaluate our model in an emergent tagging system by introducing tagging features into the MovieLens recommender system. We explore four tag selection algorithms for displaying tags applied by other community members. We analyze the algorithms’ effect on vocabulary evolution, tag utility, tag adoption, and user satisfaction.

The influence of an interface on the creation of topic maps is an open area for research. Research on tagging behavior is an excellent starting point for such studies.

Question: Would you modify the experimental setup to test the creation of topics? If so, in what way? Why?

October 3, 2010

Exploratory information search by domain experts and novices

Exploratory information search by domain experts and novices Authors: Ruogu Kang, Wai-Tat Fu Keywords: domain expertise, exploratory search, social search

Abstract:

The arising popularity of social tagging system has the potential to transform traditional web search into a new era of social search. Based on the finding that domain expertise could influence search behavior in traditional search engines, we hypothesized and tested the idea that domain expertise would have similar influence on search behavior in a social tagging system. We conducted an experiment comparing search behavior of experts and novices when they searched using a tradition search engine and a social tagging system. Results from our experiment showed that experts relied more on their own domain knowledge to generate search queries, while novices were influenced more by social cues in the social tagging system. Experts were also found to conform to each other more than novices in their choice of bookmarks and tags. Implications on the design of future social information systems are discussed.

Empirical validation of the idea that expert searchers (dare I say librarians?) can improve the search results for “novice” searchers.

A line of research that librarians need to take up and expand to combat budget cuts by the uninformed.

Note that experts suffer from the “vocabulary” problem just like novices, just in more sophisticated ways.

October 2, 2010

Anything to Topic Maps

Filed under: Authoring Topic Maps,Topic Map Software,Topic Maps — Patrick Durusau @ 5:08 am

Anything to Topic Maps.

Lars Heuer announced Anything to Topic Maps – (Email Announcement), saying while it maps anything, Mappify presently only maps atom feeds. (more to follow)

Lars also illustrates how promiscuous topic mapping can lead to unexpected results. 😉

DocuBrowse: faceted searching, browsing, and recommendations in an enterprise context

DocuBrowse: faceted searching, browsing, and recommendations in an enterprise context Authors: Andreas Girgensohn, Frank Shipman, Francine Chen, Lynn Wilcox Keywords: document management, document recommendation, document retrieval, document visualization, faceted search

Abstract:

Browsing and searching for documents in large, online enterprise document repositories are common activities. While internet search produces satisfying results for most user queries, enterprise search has not been as successful because of differences in document types and user requirements. To support users in finding the information they need in their online enterprise repository, we created DocuBrowse, a faceted document browsing and search system. Search results are presented within the user-created document hierarchy, showing only directories and documents matching selected facets and containing text query terms. In addition to file properties such as date and file size, automatically detected document types, or genres, serve as one of the search facets. Highlighting draws the user’s attention to the most promising directories and documents while thumbnail images and automatically identified keyphrases help select appropriate documents. DocuBrowse utilizes document similarities, browsing histories, and recommender system techniques to suggest additional promising documents for the current facet and content filters.

Watch the movie of this interface in action at the ACM page.

Then imagine it with collaboration and subject identity.

Towards a reputation-based model of social web search

Towards a reputation-based model of social web search Authors: Kevin McNally, Michael P. O’Mahony, Barry Smyth, Maurice Coyle, Peter Briggs Keywords: collaborative web search, heystaks, reputation model

Abstract:

While web search tasks are often inherently collaborative in nature, many search engines do not explicitly support collaboration during search. In this paper, we describe HeyStaks (www.heystaks.com), a system that provides a novel approach to collaborative web search. Designed to work with mainstream search engines such as Google, HeyStaks supports searchers by harnessing the experiences of others as the basis for result recommendations. Moreover, a key contribution of our work is to propose a reputation system for HeyStaks to model the value of individual searchers from a result recommendation perspective. In particular, we propose an algorithm to calculate reputation directly from user search activity and we provide encouraging results for our approach based on a preliminary analysis of user activity and reputation scores across a sample of HeyStaks users.

The reputation system posed by the authors could easily underlie a collaborative approach to creation of a topic map.

Think collections not normally accessed by web search engines, The National Archives (U.S.) and similar document collections.

Reputation + trails + subject identity = Hard to Beat.

See www.heystaks.com as a starting point.

October 1, 2010

Tell me more, not just “more of the same”

Tell me more, not just “more of the same” Authors: Francisco Iacobelli, Larry Birnbaum, Kristian J. Hammond Keywords: dimensions of similarity, information retrieval, new information detection

Abstract:

The Web makes it possible for news readers to learn more about virtually any story that interests them. Media outlets and search engines typically augment their information with links to similar stories. It is up to the user to determine what new information is added by them, if any. In this paper we present Tell Me More, a system that performs this task automatically: given a seed news story, it mines the web for similar stories reported by different sources and selects snippets of text from those stories which offer new information beyond the seed story. New content may be classified as supplying: additional quotes, additional actors, additional figures and additional information depending on the criteria used to select it. In this paper we describe how the system identifies new and informative content with respect to a news story. We also how that providing an explicit categorization of new information is more useful than a binary classification (new/not-new). Lastly, we show encouraging results from a preliminary evaluation of the system that validates our approach and encourages further study.

If you are interested in the automatic extraction, classification and delivery of information, this article is for you.

I think there are (at least) two interesting ways for “Tell Me More” to develop:

First, persisting entity recognition with other data (such as story, author, date, etc.) in the form of associations (with appropriate roles, etc.).

Second, and perhaps more importantly, to enable users to add/correct information presented as part of a mapping of information about particular entities.

SocialSearchBrowser: A novel mobile search and information discovery tool

SocialSearchBrowser: A novel mobile search and information discovery tool Authors: Karen Church, Joachim Neumann, Mauro Cherubini and Nuria Oliver Keywords: Mobile search, social search, social networks, location-based services, context, field study, user evaluation

Abstract:

The mobile Internet offers anytime, anywhere access to a wealth of information to billions of users across the globe. However, the mobile Internet represents a challenging information access platform due to the inherent limitations of mobile environments, limitations that go beyond simple screen size and network issues. Mobile users often have information needs which are impacted by contexts such as location and time. Furthermore, human beings are social creatures that often seek out new strategies for sharing knowledge and information in mobile settings. To investigate the social aspect of mobile search, we have developed SocialSearchBrowser (SSB), a novel proof-of-concept interface that incorporates social networking capabilities with key mobile contexts to improve the search and information discovery experience of mobile users. In this paper, we present the results of an exploratory field study of SSB and outline key implications for the design of next generation mobile information access services.

Interesting combination of traditional “ask a search engine” with even more traditional “ask your friend’s” results. Sample is too small to say what issues might be encountered with wider use but definitely a step in an interesting direction.

« Newer PostsOlder Posts »

Powered by WordPress