Search Engines « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 3, 2010

Search User Interfaces: Chapter 1 (Part 1)

Filed under: Full-Text Search,Information Retrieval,Search Engines,Search Interface,Searching — Patrick Durusau @ 7:59 pm

Chapter 1, The Design of Search User Interfaces of Hearst’s Search User Interfaces, surveys searching and related issues from a user interface perspective.

I needed the reminders about the need for simplicity in search interfaces and the shift in search interface design. (sections 1.1 – 1.2) If you think you have a “simple” interface for your topic map, read those two sections. Then read them again.

Design principles for user interface design (sections 1.3 – 1.4) is a good overview and contrast between user centered design and developers deciding what users need design. (Which one did you use?)

Feedback from search interfaces (section 1.5) ranges from the use of two dimensional representation of items as icons (against) to highlighting query terms, sorting and query term suggestions (generally favorable).

Let’s work towards having interfaces that are as attractive to users as our topic map applications are good at semantic integration.

Comments Off

April 25, 2010

A “Terrier” For Your Tool Box?

Filed under: Search Engines — Patrick Durusau @ 2:55 pm

Terrier IR project description:

Terrier is a highly flexible, efficient, and effective open source search engine, readily deployable on large-scale collections of documents. Terrier implements state-of-the-art indexing and retrieval functionalities, and provides an ideal platform for the rapid development and evaluation of large-scale retrieval applications.

Terrier is open source, and is a comprehensive, flexible and transparent platform for research and experimentation in text retrieval. Research can easily be carried out on standard TREC and CLEF test collections.

Become comfortable with the TREC or CLEF test collections.

A topic map on any part of either collection would attract IR researchers to topic maps.

Comments (3)

April 15, 2010

What Is Your TFM (To Find Me) Score?

Filed under: Information Retrieval,Recall,Search Engines,Subject Identity — Patrick Durusau @ 10:54 am

I have talked about TFM (To Find Me) scores before. Take a look at How Can I Find Thee? Let me count the ways… for example.

So, you have looked at your OPAC, database, RDF datastore, topic map. What is your average TMF Score?

What do you think it needs to be for 60 to 80% retrieval?

The Furnas article from 1983 is the key to this series of posts. See the full citation in Are You Designing a 10% Solution?.

Would you believe 15 ways to identify a subject? Or aliases to use the common terminology.

Say it slowly, 15 ways to identify a subject gets on average 60 to 80% retrieval. If you are in the range of 3 – 5 ways to identify a subject on your ecommerce site, you are leaving money on the table. Lots of money on the table.

Want to leave less money on the table? Use topic maps and try for 15 aliases for a subject or more.

Comments Off

April 12, 2010

Topic Maps and the “Vocabulary Problem”

Filed under: Full-Text Search,Heterogeneous Data,Information Retrieval,Search Engines,Searching,Semantic Diversity,Vocabulary Mismatch — Patrick Durusau @ 3:09 pm

To situate topic maps in a traditional area of IR (information retrieval), try the “vocabulary problem.”

Furnas describes the “vocabulary problem” as follows:

Many functions of most large systems depend on users typing in the right words. New or intermittent users often use the wrong words and fail to get the actions or information they want. This is the vocabulary problem. It is a troublesome impediment in computer interactions both simple (file access and command entry) and complex (database query and natural language dialog).

In what follows we report evidence on the extent of the vocabulary problem, and propose both a diagnosis and a cure. The fundamental observation is that people use a surprisingly great variety of words to refer to the same thing. In fact, the data show that no single access word, however well chosen, can be expected to cover more than a small proportion of user’s attempts. Designers have almost always underestimated the problem and, by assigning far too few alternate entries to databases or services, created an unnecessary barrier to effective use. Simulations and direct experimental tests of several alternative solutions show that rich, probabilistically weighted indexes or alias lists can improve success rates by factors of three to five.

The Vocabulary Problem in Human-System Communication (1987)

Substitute topic maps for probabilistically weighted indexes or alias lists. (Techniques we are going to talk about in connection with topic maps authoring.)

Three to five times greater success is an incentive to use topic maps.

Marketing Department Summary

Customers can’t buy what they can’t find. Topic Maps help customers find purchases, increases sales. (Be sure to track pre and post topic maps sales results. So marketing can’t successfully claim the increases are due to their efforts.)

Comments Off

April 5, 2010

Are You Designing a 10% Solution?

Filed under: Full-Text Search,Heterogeneous Data,Recall,Search Engines — Patrick Durusau @ 8:28 pm

The most common feature on webpages is the search box. It is supposed to help readers find information, products, services; in other words, help the reader or your cash flow.

How effective is text searching? How often will your reader use the same word as your content authors for some object, product, service? Survey says: 10 to 20%!*

So the next time you insert a search box on a webpage, you or your client may be missing 80 to 90% of the potential readers or customers. Ouch!

Unlike the imaginary world of universal and unique identifiers, the odds of users choosing the same words has been established by actual research.

The data sets were:

verbs used to describe text-editing operations
descriptions of common objects, similar to PASSWORD ™ game
superordinate category names for swap-and-sale listings
main-course cooking recipes

There are a number of interesting aspects to the study that I will cover in future posts but the article offers the following assessment of text searching:

We found that random pairs of people use the same word for an object only 10 to 20 percent of the time.

This research is relevant to all information retrieval systems. Online stores, library catalogs, whether you are searching simple text, RDF or even topic maps. Ask yourself or your users: Is a 10% success rate really enough?

(There ways to improve that 10% score. More on those to follow.)

*Furnas, G. W., Landauer, T. K., Gomez, L. M., Dumais, S. T., (1983) “Statistical semantics: Analysis of the potential performance of keyword information access systems.” Bell System Technical Journal, 62, 1753-1806. Reprinted in: Thomas, J.C., and Schneider, M.L, eds. (1984) Human Factors in Computer Systems. Norwood, New Jersey: Ablex Publishing Corp., 187-242.

Comments (3)

March 24, 2010

There’s (Another) Name For That

Filed under: Search Engines,Vocabulary Mismatch — Patrick Durusau @ 8:03 pm

Semantic integration research could really benefit from semantic integration!

After years of using Steve Newcomb’s semantic impedance to describe identifying the same subject differently, I run across (another) name for that subject: vocabulary mismatch.

“Mismatch” covers a multitude of reasons, conditions and sins.

I encountered the term reading Search Engines: Information Retrieval in Practice by W. Bruce Croft, Donald Metzler, and Trevor Strohman. More comments on this book to appear in future posts. For now, buy it!

A friend recently remarked that my posts cover a lot of territory. True but subject identity is a big topic.

The broader our reading/research, the better we will be able to assist users in developing solutions that work for them and their subjects.

It is always possible to narrow one’s research/reading for a particular project, but broader vistas await for those who seek them out.

Comments Off

« Newer Posts