Archive for the ‘Vocabulary Mismatch’ Category

Interactive Intent Modeling: Information Discovery Beyond Search

Wednesday, March 18th, 2015

Interactive Intent Modeling: Information Discovery Beyond Search by Tuukka Ruotsalo, Giulio Jacucci, Petri Myllymäki, Samuel Kaski.

From the post:

Combining intent modeling and visual user interfaces can help users discover novel information and dramatically improve their information-exploration performance.

Current-generation search engines serve billions of requests each day, returning responses to search queries in fractions of a second. They are great tools for checking facts and looking up information for which users can easily create queries (such as “Find the closest restaurants” or “Find reviews of a book”). What search engines are not good at is supporting complex information-exploration and discovery tasks that go beyond simple keyword queries. In information exploration and discovery, often called “exploratory search,” users may have difficulty expressing their information needs, and new search intents may emerge and be discovered only as they learn by reflecting on the acquired information. 8,9,18 This finding roots back to the “vocabulary mismatch problem” 13 that was identified in the 1980s but has remained difficult to tackle in operational information retrieval (IR) systems (see the sidebar “Background”). In essence, the problem refers to human communication behavior in which the humans writing the documents to be retrieved and the humans searching for them are likely to use very different vocabularies to encode and decode their intended meaning. 8,21

Assisting users in the search process is increasingly important, as everyday search behavior ranges from simple look-ups to a spectrum of search tasks 23 in which search behavior is more exploratory and information needs and search intents uncertain and evolving over time.

We introduce interactive intent modeling, an approach promoting resourceful interaction between humans and IR systems to enable information discovery that goes beyond search. It addresses the vocabulary mismatch problem by giving users potential intents to explore, visualizing them as directions in the information space around the user’s present position, and allowing interaction to improve estimates of the user’s search intents.

What!? All those years spend trying to beat users into learning complex search languages were in vain? Say it’s not so!

But, apparently it is so. All of the research on “vocabulary mismatch problem,” “different vocabularies to encode and decode their meaning,” has come back to bite information systems that offer static and author-driven vocabularies.

Users search best, no surprise, through vocabularies they recognize and understand.

I don’t know of any interactive topic maps in the sense used here but that doesn’t mean that someone isn’t working on one.

A shift in this direction could do wonders for the results of searches.

CFP – Dealing with the Messiness of the Web of Data – Journal of Web Semantics

Friday, December 17th, 2010

CFP – Dealing with the Messiness of the Web of Data – Journal of Web Semantics

From the call:

Research on the Semantic Web, which is now in its second decade, has had a tremendous success in encouraging people to publish data on the Web in structured, linked, and standardized ways. The success of what has now become the Web of Data can be read from the sheer number of triples available within the Linked-Open Data, Linked Life Data and Open-Government initiatives. However, this growth in data makes many of the established assumptions inappropriate and offers a number of new research challenges.

In stark contrast to early Semantic Web applications that dealt with small, hand-crafted ontologies and data-sets, the new Web of Data comes with a plethora of contradicting world-views and contains incomplete, inconsistent, incorrect, fast-changing and opinionated information. This information not only comes from academic sources and trustworthy institutions, but is often community built, scraped or translated.

In short: the Web of Data is messy, and methods to deal with this messiness are paramount for its future.

Now, we have two choices as the topic map community:

  • congratulate ourselves for seeing this problem long ago, high five each other, etc., or
  • step up and offer topic map solutions that incorporate as much of the existing SW work as possible.

I strongly suggest the second one.

Important dates:

We will aim at an efficient publication cycle in order to guarantee prompt availability of the published results. We will review papers on a rolling basis as they are submitted and explicitly encourage submissions well before the submission deadline. Submit papers online at the journal’s Elsevier Web site.

Submission deadline: 1 February 2011
Author notification: 15 June 2011

Revisions submitted: 1 August 2011
Final decisions: 15 September 2011
Publication: 1 January 2012

Demonstrating The Need For Topic Maps

Saturday, June 19th, 2010

Individual Differences in the Interpretation of Text: Implications for Information Science by Jane Morris demonstrates that different readers have different perceptions of lexical cohesion in a text. About 40% worth’s of difference. That is a difference in the meaning of the text.

Many tasks in library and information science (e.g., indexing, abstracting, classification, and text analysis techniques such as discourse and content analysis) require text meaning interpretation, and, therefore, any individual differences in interpretation are relevant and should be considered, especially for applications in which these tasks are done automatically. This article investigates individual differences in the interpretation of one aspect of text meaning that is commonly used in such automatic applications: lexical cohesion and lexical semantic relations. Experiments with 26 participants indicate an approximately 40% difference in interpretation. In total, 79, 83, and 89 lexical chains (groups of semantically related words) were analyzed in 3 texts, respectively. A major implication of this result is the possibility of modeling individual differences for individual users. Further research is suggested for different types of texts and readers than those used here, as well as similar research for different aspects of text meaning.

I won’t belabor what a 40% difference in interpretation implies for the one interpretation of data crowd. At least for those who prefer an evidence versus ideology approach to IR.

What is worth belaboring is how to use Morris’ technique to demonstrate such differences in interpretation to potential topic map customers. As a community we could develop texts for use with particular market segments, business, government, legal, finance, etc. An interface to replace the colored pencils used to mark all words belonging to a particular group. Automating some of the calculations and other operations on the resulting data.

Sensing that interpretations of texts vary is one thing. Having an actual demonstration, possibly using texts from a potential client, is quite another.

This is a tool we should build. I am willing to help. Who else is interested?

Topic Maps and the “Vocabulary Problem”

Monday, April 12th, 2010

To situate topic maps in a traditional area of IR (information retrieval), try the “vocabulary problem.”

Furnas describes the “vocabulary problem” as follows:

Many functions of most large systems depend on users typing in the right words. New or intermittent users often use the wrong words and fail to get the actions or information they want. This is the vocabulary problem. It is a troublesome impediment in computer interactions both simple (file access and command entry) and complex (database query and natural language dialog).

In what follows we report evidence on the extent of the vocabulary problem, and propose both a diagnosis and a cure. The fundamental observation is that people use a surprisingly great variety of words to refer to the same thing. In fact, the data show that no single access word, however well chosen, can be expected to cover more than a small proportion of user’s attempts. Designers have almost always underestimated the problem and, by assigning far too few alternate entries to databases or services, created an unnecessary barrier to effective use. Simulations and direct experimental tests of several alternative solutions show that rich, probabilistically weighted indexes or alias lists can improve success rates by factors of three to five.

The Vocabulary Problem in Human-System Communication (1987)

Substitute topic maps for probabilistically weighted indexes or alias lists. (Techniques we are going to talk about in connection with topic maps authoring.)

Three to five times greater success is an incentive to use topic maps.

Marketing Department Summary

Customers can’t buy what they can’t find. Topic Maps help customers find purchases, increases sales. (Be sure to track pre and post topic maps sales results. So marketing can’t successfully claim the increases are due to their efforts.)

There’s (Another) Name For That

Wednesday, March 24th, 2010

Semantic integration research could really benefit from semantic integration!

After years of using Steve Newcomb’s semantic impedance to describe identifying the same subject differently, I run across (another) name for that subject: vocabulary mismatch.

“Mismatch” covers a multitude of reasons, conditions and sins.

I encountered the term reading Search Engines: Information Retrieval in Practice by W. Bruce Croft, Donald Metzler, and Trevor Strohman. More comments on this book to appear in future posts. For now, buy it!

A friend recently remarked that my posts cover a lot of territory. True but subject identity is a big topic.

The broader our reading/research, the better we will be able to assist users in developing solutions that work for them and their subjects.

It is always possible to narrow one’s research/reading for a particular project, but broader vistas await for those who seek them out.