Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 30, 2010

Assessing the scenic route: measuring the value of search trails in web logs

Filed under: Authoring Topic Maps,Searching,Subject Identity,Topic Maps — Patrick Durusau @ 10:34 am

Assessing the scenic route: measuring the value of search trails in web logs Authors: Ryen W. White, Jeff Huang Keywords: log analysis, search trails, trail following

Abstract:

Search trails mined from browser or toolbar logs comprise queries and the post-query pages that users visit. Implicit endorsements from many trails can be useful for search result ranking, where the presence of a page on a trail increases its query relevance. Follow-ing a search trail requires user effort, yet little is known about the benefit that users obtain from this activity versus, say, sticking with the clicked search result or jumping directly to the destination page at the end of the trail. In this paper, we present a log-based study estimating the user value of trail following. We compare the relevance, topic coverage, topic diversity, novelty, and utility of full trails over that provided by sub-trails, trail origins (landing pages), and trail destinations (pages where trails end). Our findings demonstrate significant value to users in following trails, especially for certain query types. The findings have implications for the design of search systems, *including trail recommendation systems that display trails on search result pages.* (emphasis added)

If your topic map client has search logs for internal resources, don’t neglect those as part of your topic map construction process. For identification of important subjects and navigation links between subjects.

This was the best paper for SIGIR 2010.

September 29, 2010

Natural Language Toolkit

Natural Language Toolkit is a set of Python modules for natural language processing and text analytics. Brought to my attention by Kirk Lowery.

Two near term tasks come to mind:

  • Feature comparison to LingPipe
  • Finding linguistic software useful for topic maps

Suggestions of other toolkits welcome!

September 28, 2010

Mining Billion-node Graphs: Patterns, Generators and Tools

Filed under: Authoring Topic Maps,Data Mining,Graphs,Software,Subject Identity — Patrick Durusau @ 9:38 am

Mining Billion-node Graphs: Patterns, Generators and Tools Author: Christos Faloutsos (CMU)

Presentation on the Pegasus (PETRA GrAph mining System) project.

If you have large amounts of real world data and need some motivation, take a look at this presentation.

September 24, 2010

Pastebin for Topic Maps

Filed under: Authoring Topic Maps,Topic Map Software — Patrick Durusau @ 6:37 am

Pastebin for topic maps.

From Lars Heuer, a syntax highlighter for topic map syntax.

A proposal for JTM 1.1

Filed under: Authoring Topic Maps,JTM,Topic Maps — Patrick Durusau @ 6:37 am

A proposal for JTM 1.1.

Disdaining to create another topic map syntax, Jan Scheiber has proposed JTM 1.1. (Good thing it wasn’t another topic map syntax. 😉 )

Seriously, it is a proposal designed to make using topic maps on mobile devices a more viable option.

REST in the Web3 Platform

Filed under: Authoring Topic Maps,Topic Map Software,Topic Maps — Patrick Durusau @ 6:30 am

REST in the Web3 Platform.

Graham Moore details the choices made to make the Web3 platform follow “many” of the principles of REST.

While you are there, watch Web3 Platform Getting Started No 1. Good presentation.

******
Additional resources:

Tutorial on REST: Learn REST: A Tutorial

Fielding, Roy T.; Taylor, Richard N., Principled Design of the Modern Web Architecture

Fielding, Roy Thomas (dissertation), Architectural Styles and the Design of Network-based Software Architectures

Update:

See also: Restful Interface to Topic Maps Another REST interface effort.

September 18, 2010

Topic Map Question #1 – What Subjects/Entities Do You Want To Talk About?

Filed under: Authoring Topic Maps,Subject Identity,Topic Maps — Patrick Durusau @ 3:06 pm

The first topic map question is: “What subjects/entities do you want to talk about?”

Until that question is explored (it isn’t ever fully answered), the answers to other questions remain dangerously vague:

  • How to identify those subjects?
  • How do others identify the same subjects?
  • Are other identifications of any interest?
  • What other subjects are of interest?
  • How should those subjects be identified?
  • What relationships between subjects should be identified?
  • How should relationships between subjects be identified?
  • etc.

The responses “just use syntax X” or “use software Y” are answers to the question about subjects/entities.

Just not explicit answers.

Characteristic of the pig in a poke school of topic map design.

September 16, 2010

UCI Machine Learning Datasets

Filed under: Authoring Topic Maps,Dataset,Interface Research/Design — Patrick Durusau @ 4:12 am

UCI Machine Learning Datasets Collection of 194 datasets (as of 2010/09/14) for machine learning.

Re-purpose to develop/test interfaces to assist in authoring topic maps.

September 15, 2010

Taxonomy for Characterizing Ensemble Methods in Classification Tasks

Filed under: Authoring Topic Maps,Classification,Ensemble Methods — Patrick Durusau @ 8:11 am

Taxonomy for Characterizing Ensemble Methods in Classification Tasks Author: Lior Rokach Keywords: Ensemble-methods; Classification; Boosting; Bagging; Partitioning; Decision trees; Neural networks. Review and annotated bibliography of work on ensemble methods.

Ensemble methods, I like the sound of that.

Extend it to mean human authors + other methods creating a topic map.

September 12, 2010

Gaming for Topic Maps?

Gaming for a Cure: Computer Gamers Tackle Protein Folding describes how over 57,000 “players” bested supercomputers:

Analysis shows that players bested the computers on problems that required radical moves, risks and long-term vision — the kinds of qualities that computers do not possess.

Distributed human contribution to massive information projects is a proven fact. (The reading programme of the OED is an earlier example.)

Can you make mapping large data sets into an interesting game?

For some clues, see: Foldit.

August 31, 2010

One of These Things

One of These Things could be a theme song for topic maps.

It is also a good idea for a topic map authoring interface.

Say you get ten (10) “hits” back from a search. Add a “checkbox” to each “hit.” Unchecked means same as other unchecked “hits.” Checked means different from the unchecked “hits.”

The “same subject” judgment becomes a collective one of all the users of the search interface. Different “hits” are going to be unchecked in any search return.

Semantic input = Human input.

July 5, 2010

Data-Intensive Text Processing with MapReduce – Book

Filed under: Authoring Topic Maps,MapReduce,Software — Patrick Durusau @ 5:30 am

Data-Intensive Text Processing with MapReduce will help answer the question: What subjects are available in a given torrent of information?

Or, perhaps the more interesting question, What subjects did you find in a given torrent of information?

Not exactly the same question is it?

The first presumes that we are going to find the same subjects and the second does not.

Download the Final Manuscript Support the authors by buying a copy as well: publisher’s site.

Authored by Jimmy Lin and Chris Dyer.

Very interested in hearing from anyone using MapReduce to mine texts for use in topic map construction.

*****
Updated to insert the authors. Opps! 20 April 2011

June 30, 2010

ANN: Finally! DBpedia and Wikipedia switched to Topic Maps! – News

Filed under: Authoring Topic Maps,CTM,Topic Map Software,Topic Maps,XTM — Patrick Durusau @ 7:07 pm

ANN: Finally! DBpedia and Wikipedia switched to Topic Maps!, according to Lars Heuer.

See his post for the details but if you are capable of installing plugins in a FireFox browser, you can use his DBpedia / Wikipedia -> Topic Maps service within your browser to create topic maps.

The bar for creating topic maps just keeps getting lower!

******
A few minutes later….

Caveat: I am already running FireFox 3.6.6 so your experience may vary, but….this rocks!

Installation of GreaseMonkey and the Mappify browser plugins was very slick (only GreaseMonkey required a restart) and then a quick jaunt to Wikipedia and the first article I pulled up, “rough sets” (that is *sets*), has “Mappify” next to the title and it presents a drop down menu of XTM, CTM and JTM, in that order. Pick one and it offers you the file.

It doesn’t get any slicker than this! Kudos to Lars Heuer!

June 25, 2010

Mappify – DBpedia and Wikipedia to Topic Maps – New Service

Filed under: Authoring Topic Maps,Topic Map Software,Topic Maps — Patrick Durusau @ 4:00 pm

Mappify – DBpedia and Wikipedia to Topic Maps is the latest shot over the Topic Maps Lab bow! 😉

Toss a Wikipedia or DBpedia source at this service and get back a topic map! In one of four flavors: xtm, ctm, json, or jtm.

UPDATE: 26 June 2010 – The bug reported below has been fixed!

Warning You must use correct case for URLs.

Incorrect usage: Wikipedia URL for Marilyn Monroe, http://en.wikipedia.org/wiki/Marilyn_monroe. I did not notice that the page says: “(Redirected from Marilyn monroe).” Not much of a re-direct if it leaves me with the incorrect URL.

Correct the case on entries to match the page title above the redirect notice and you will be fine.

Correct usage: http://en.wikipedia.org/wiki/Marilyn_Monroe.

For those unfamiliar with our community, the “competition” between Semagia (Lars Heuer) and the Topic Maps Lab is entirely friendly. It just makes better copy to portray them as fierce competitors leap frogging each other with topic map technologies and resources.

June 1, 2010

Enhancing navigation in biomedical databases by community voting and database-driven text classification

Enhancing navigation in biomedical databases by community voting and database-driven text classification demonstrates improvement of automatic classification of literature by harnessing community knowledge.

From the authors:

Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases.

The system can be seen at: PepBank.

You need to read the article in full to appreciate what the authors have done but a couple of quick points to notice:

1) The use of heat maps to assist users in determining the relevance of a given abstract. (Domain specific facts.)

2) The user interface allows yes/no voting on the same facts as appear in the heat map.

Voting results in reclassification of the entries.

Equally important is a user interface that enables immediate evaluation of relevance and, quick user feedback on relevance.

The user is not asked a series of questions, given complex rating choices, etc., it is yes or no. That may seem coarse but the project demonstrates with proper design, that can be very useful.

May 29, 2010

Association Rules

Filed under: Authoring Topic Maps,Data Mining — Patrick Durusau @ 6:28 am

Apologies for posting on association rules in Private Mining of Association Rules, a term of art that might be confusing to topic map advocates, without defining it.

When we buy an item online, most retailers suggest that other buyers also purchased … some list of items. The “association” of those items together can be represented by a Boolean vector, composed of values for the presence or absence of an item. To form an association rule, such a vector is accompanied by support and confidence values.

The support value indicates the percentage of a data set where the association occurs. That is the items in question appear together.

The confidence value indicates what percentage of one value is accompanied by another.

Minimums of these values are known as minimal support threshold and minimal confidence threshold and typically appear together.

For more information on “association rules,” see Data Mining: Concepts and Techniques by Jiawei Han and Micheline Kamber, at page 229. (The publication date for the second edition in WorldCat (the link on the title) is wrong. Should be 2006.)

Supplemental Materials for Data Mining. I am checking on the status of the apparent 3rd edition so you might want to wait on buying a copy. Would make a great text for an advanced topic maps course that focused on populating a topic map.

May 15, 2010

Semantic Indexing

Filed under: Authoring Topic Maps,Indexing,Information Retrieval,Semantics — Patrick Durusau @ 6:41 pm

Semantic indexing and searching using a Hopfield net

Automatic creation of thesauri as a means of dealing with the “vocabulary problem.”

Another topic map construction tool.

A bit dated, 1997, but will run this line of research forward and report back.

With explicit subject identity, machine generated thesauri could be reliably interchanged.

And improved upon by human users.

April 27, 2010

Use My Model/Language Mister!

Filed under: Authoring Topic Maps,Semantics,Subject Identity,Topic Maps — Patrick Durusau @ 6:48 pm

“Use My Model/Language Mister!” is the cry of markup, modeling and semantics projects.

They all equally sincere and if you don’t like any of them, wait another six months or so for additional choices.

I don’t remember if it was after the 75th or 100th or somewhere past the 100th “true” model that I began to suspect something was amiss.

Models and languages change over time and can be barriers to discussion and discovery of badly needed information.

Rather than arguing for this or that model, as though it were some final answer, why not ask which model suits our present purposes?

With topic maps, once the subjects under discussion are identified, how they are represented for some purpose is a detail. A very important detail but a detail none the less.

If, or rather when, our requirements change, the same subject can be represented in a different way. The subjects can be identified, again, to create a new representation, or, if identified using topic maps, our job of moving to another model just got a whole lot easier.

March 26, 2010

Concept Hierarchies and Topic Maps

Filed under: Authoring Topic Maps,Concept Hierarchies,Topic Maps — Patrick Durusau @ 12:16 pm

Concept hierarchies are easy to represent in topic maps and are fundamental to navigation of information resources. So much for the obvious.

Topic maps standards work and debates over arcane issues don’t prepare us to answer the user question: “Excuse me. What concept hierarchy should I use in my topic map?”

The typical response: “Whatever hierarchy you want. Completely unbounded.” That is about as helpful as a poke with a sharp stick.

You don’t want to give your users a copy of this article, but consider reading Deriving concept hierarchies from text by Sanderson and Croft as an introduction to deriving concept hierarchies from the user’s document collection.

Users (aka, paying customers) will appreciate your assistance in developing a hierarchy for their topic map, as opposed to the “well, that’s your problem” approach.

As the links for the authors show, this isn’t the latest word on deriving concept hierarchies. But, it is well written and is a useful starting place. For my part I want to run this backwards to its sources and forward to the latest techniques. More posts coming on this and other techniques that may be useful for building topic maps.

« Newer Posts

Powered by WordPress