Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 31, 2010

R2RML: RDB to RDF Mapping Language

Filed under: RDF,Semantic Web — Patrick Durusau @ 8:13 pm

R2RML: RDB to RDF Mapping Language

Abstract:

This document describes R2RML, a language for expressing customized mappings from relational databases to RDF datasets. Such mappings provide the ability to view existing relational data in the RDF data model, expressed in a structure and target vocabulary of the mapping author’s choice. R2RML mappings are themselves RDF graphs and written down in Turtle syntax. R2RML enables different types of mapping implementations: processors could, for example, offer a virtual SPARQL endpoint over the mapped relational data, or generate RDF dumps, or offer a Linked Data interface.

First draft from the RDB2RDF working group.

Questions:

  1. Select a table from two (or three) databases in a common area with different schemas.
  2. Convert the tables using the latest version of this proposal to RDF datasets.
  3. On what basis would you integrate the resulting RDF datasets into a single RDF dataset?

7. “We always know more than we can say, and we will always say more than we can write down.”

Filed under: Authoring Topic Maps,Knowledge Management,Marketing,Topic Maps — Patrick Durusau @ 7:59 pm

Knowledge Management Principle Seven of Seven (Rendering Knowledge by David Snowden)

We always know more than we can say, and we will always say more than we can write down. This is probably the most important. The process of taking things from our heads, to our mouths (speaking it) to our hands (writing it down) involves loss of content and context. It is always less than it could have been as it is increasingly codified.

Authoring a topic map always involves loss of content and context.

The same loss of content and context has bedeviled the AI community for the last 50 years.

No one can control the loss content and context or even identify it ahead of time.

Testing topic maps on users will help bring them closer to user expectations.

UMLS-SKOS

Filed under: Bioinformatics,Biomedical,SKOS,UMLS — Patrick Durusau @ 7:25 pm

UMLS-SKOS

Abstract:

SKOS is a Semantic Web framework for representing thesauri, classification schemes, subject heading systems, controlled vocabularies, and taxonomies. It enables novel ways of representing terminological knowledge and its linkage with domain knowledge in unambiguous, reusable, and encapsulated fashion within computer applications. According to the National Library of Medicine, the UMLS Knowledge Source (UMLS-KS) integrates and distributes key terminology, classification and coding standards, and associated resources to promote creation of more effective and interoperable biomedical information systems and services “that behave as if they ‘understand’ the meaning of the language of biomedicine and health”. However the current information representation model utilized by UMLS-KS itself is not conducive to computer programs effectively retrieving and automatically and unambiguously interpreting the ‘meaning’ of the biomedical terms and concepts and their relationships.

In this presentation we propose using Simple Knowledge Organization System (SKOS) as an alternative to represent the body of knowledge incorporated within the UMLS-KS within the framework of the Semantic Web technologies. We also introduce our conceptualization of a transformation algorithm to produce an SKOS representation of the UMLS-KS that integrates UMLS-Semantic Network, the UMLS-Metathesaurus complete with all its source vocabularies as a unified body of knowledge along with appropriate information to trace or segregate information based on provenance and governance information. Our proposal and method is based on the idea that formal and explicit representation of any body of knowledge enables its unambiguous, and precise interpretation by automated computer programs. The consequences of such undertaking would be at least three fold: 1) ability to automatically check inconsistencies and errors within a large and complex body of knowledge, 2) automated information interpretation, integration, and discovery, and 3) better information sharing, repurposing and reusing (adoption), and extending the knowledgebase within a distributed and collaborative community of researchers. We submit that UMLS-KS is no exception to this and may benefit from all those advantages if represented fully using a formal representation language. Using SKOS in combination with the transformation algorithm introduced in this presentation are our first steps in that direction. We explain our conceptualization of the algorithms, problems we encountered and how we addressed them with a brief gap analysis to outline the road ahead of us. At the end we also present several use cases from our laboratories at the School of Health information Sciences utilizing this artifact.

WebEx Recording Presentation

Slides

The slides are good but you will need to watch the presentation to give them context.

My only caution concerns:

Our proposal and method is based on the idea that formal and explicit representation of any body of knowledge enables its unambiguous, and precise interpretation by automated computer programs.

I don’t doubt that our computers can return “unambiguous, and precise interpretation[s]” but that isn’t the same thing as “correct” interpretations.

OpenII

Filed under: Data Structures,Heterogeneous Data,Information Retrieval,Software — Patrick Durusau @ 7:20 pm

OpenII

From the website:

OpenII is a collaborative effort spearheaded by The MITRE Corporation and Google to create a suite of open-source tools for information integration. The project is leveraging the latest developments in research on information integration to create a platform on which integration applications can be built and further research can be conducted.

The motivation for OpenII is that although a significant amount of research has been conducted on information integration, and several commercial systems have been deployed, many information integration applications are still hard to build. In research, we often innovate on a specific aspect of information integration, but then spend much our time building (and rebuilding) other components that we need in order to validate our contributions. As a result, the research prototypes that have been built are generally not reusable and do not inter-operate with each other. On the applications side, information integration comes in many flavors, and therefore it is hard for commercial products to serve all the needs. Our goal is to create tools that can be applied in a variety of architectural contexts and can easily be tailored to the needs of particular domains.

OpenII tools include, among others, wrappers for common data sources, tools for creating matches and mappings between disparate schemas, a tool for searching collections of schemas and extending schemas, and run-time tools for processing queries over heterogeneous data sources.

The M3 metamodel:

The fundamental building block in M3 is the entity. An entity represents information about a set of related real-world objects. Associated with each entity is a set of attributes that indicate what information is captured about each entity. For simplicity, we assume that at most one value can be associated with each attribute of an entity.

The project could benefit from a strong injection of subject identity based thinking and design.

October 30, 2010

Copyright and Taxonomies

Filed under: Ontology,Topic Maps — Patrick Durusau @ 12:24 pm

A post to the Ontolog forum brought AMERICAN DENT. ASSN. v. DELTA DEN. PLANS ASSN., 126 F.3d 977 (7th Cir. 1997) to my attention.

Posted to alert you to potential copyright issues.

For licenses, consider Creative Commons.

6. “The way we know things is not the way we report we know things.”

Filed under: Authoring Topic Maps,Knowledge Management,Marketing,Topic Maps — Patrick Durusau @ 12:14 pm

Knowledge Management Principle Six of Seven (Rendering Knowledge by David Snowden)

The way we know things is not the way we report we know things. There is an increasing body of research data which indicates that in the practice of knowledge people use heuristics, past pattern matching and extrapolation to make decisions, coupled with complex blending of ideas and experiences that takes place in nanoseconds. Asked to describe how they made a decision after the event they will tend to provide a more structured process oriented approach which does not match reality. This has major consequences for knowledge management practice.

It wasn’t planned but appropriate this should follow Harry Halpin’s Sense and Reference on the Web.

Questions:

  1. Find three examples of decision making that differs from the actual process.
  2. Of the examples reported in class, would any of them impact your design of a topic map? (3-5 pages, no citations)
  3. Of the same examples, would any of them impact your design of a topic map interface? (3-5 pages, no citations)
  4. Do you consider a topic map and its interface to be different? If so, how? If not, why not? (3-5 pages, no citations)

Sense and Reference on the Web

Filed under: Semantic Web,Semantics,Subject Identity — Patrick Durusau @ 10:01 am

Sense and Reference on the Web is Harry Halpin’s thesis seeking to answer the question: “What does a Uniform Resource Identifier (URI) mean?”

Abstract:

This thesis builds a foundation for the philosophy of the Web by examining the crucial question: What does a Uniform Resource Identifier (URI) mean? Does it have a sense, and can it refer to things? A philosophical and historical introduction to the Web explains the primary purpose of the Web as a universal information space for naming and accessing information via URIs. A terminology, based on distinctions in philosophy, is employed to define precisely what is meant by information, language, representation, and reference. These terms are then employed to create a foundational ontology and principles of Web architecture. From this perspective, the Semantic Web is then viewed as the application of the principles of Web architecture to knowledge representation. However, the classical philosophical problems of sense and reference that have been the source of debate within the philosophy of language return. Three main positions are inspected: the logicist position, as exemplified by the descriptivist theory of reference and the first-generation Semantic Web, the direct reference position, as exemplified by Putnam and Kripke’s causal theory of reference and the second-generation Linked Data initiative, and a Wittgensteinian position that views the Semantic Web as yet another public language. After identifying the public language position as the most promising, a solution of using people’s everyday use of search engines as relevance feedback is proposed as a Wittgensteinian way to determine sense of URIs. This solution is then evaluated on a sample of the Semantic Web discovered by via using queries from a hypertext search engine query log. The results are evaluated and the technique of using relevance feedback from hypertext Web searches to determine relevant Semantic Web URIs in response to user queries is shown to considerably improve baseline performance. Future work for the Web that follows from our argument and experiments is detailed, and outlines of a future philosophy of the Web laid out.

Questions:

  1. Choose a non-Web reference system.
  2. What is the nature of those references? (3-5 pages, with citations)
  3. Compare those references to URIs.
  4. How are those references and URIs the same/different? (3-5 pages, with citations)
  5. Evaluate Halpin’s use of Wittgenstein. (5-10 pages, with citations)

Edinburgh Research Archives – Informatics

Filed under: Computer Science — Patrick Durusau @ 9:48 am

Edinburgh Research Archives – Informatics.

Another research collection for searching/browsing.

You might want to ask yourself how access to such archives could be improved.

Faced with the Scylla of searching on one hand and the Charybdis of browsing on the other.

Legal Informatics – Blog

Filed under: Legal Informatics — Patrick Durusau @ 9:07 am

Legal Informatics is the companion blog to Robert Richard’s LEGAL INFORMATION SYSTEMS & LEGAL INFORMATICS RESOURCES

Make this blog a regular stop if you are interested in legal informatics.


Updated URL for Legal Information Systems & Legal Informatics Resources, September 2, 2011.

Cornell University Library: Technical Reports and Papers

Filed under: Computer Science — Patrick Durusau @ 9:01 am

Cornell University Library: Technical Reports and Papers is a collection that reaches almost to the creation of the CS department at Cornell (1965, collection starts with 1968).

Excellent source of historical and current CS research.

I found it while tracking down Gerald Salton’s work on indexing.

Lots of other goodies as well.

8 Keys to Findability

8 Keys to Findability mentions in closing:

The average number of search terms is about 1.7 words, which is not a lot when searching across millions of documents. Therefore, a conversation type of experience where users can get feedback from the results and refine their search makes for the most effective search results.

I have a different take on that factoid.

The average user needs only 1.7 words to identify a subject of interest to them.

Why the gap between 1.7 words and the number of words required for “effective search results?”

Why ask?

Returning millions of “hits” is on 1.7 words is meaningless.

Returning the ten most relevant “hits” on 1.7 words is a G***** killer.

October 29, 2010

5. “Tolerated failure imprints learning better than success.”

Filed under: Authoring Topic Maps,Knowledge Management,Marketing,Topic Maps — Patrick Durusau @ 7:24 am

Knowledge Management Principle Five of Seven (Rendering Knowledge by David Snowden)

Tolerated failure imprints learning better than success. When my young son burnt his finger on a match he learnt more about the dangers of fire than any amount of parental instruction could provide. All human cultures have developed forms that allow stories of failure to spread without attribution of blame. Avoidance of failure has greater evolutionary advantage than imitation of success. It follows that attempting to impose best practice systems is flying in the face of over a hundred thousand years of evolution that says it is a bad thing.

Perhaps with fingers and matches, but I am not sure “failure imprints learning better than success” in knowledge management.

The perennial failure (as opposed to the perennial philosophy), the effort to create a “perfect” language, now using URIs, continues unabated.

The continuing failure to effectively share intelligence is another lesson slow in being learned.

Not that “best practices” would help in either case.

Should failure of “perfect” languages and sharing be principles of knowledge management?

LII/Legal Information Institute

Filed under: Law - Sources,Legal Informatics — Patrick Durusau @ 6:05 am

LII/Legal Information Institute is an effort to make US and world law freely accessible.

Managed subject identity will be valuable in navigation of primary and secondary legal materials, binding of those materials to case specific content and mining of legal discovery materials.

For mapping primary legal materials, this is the place to start.

Questions:

  1. How would you manage subject identity to eliminate duplication in U.S. case research output?
  2. Is your test court or jurisdiction specific? Any thoughts on how to broaden that?
  3. Same questions but pick a non-U.S. jurisdiction.
  4. How would you manage subject identity with regard to statutory citations?
  5. Bonus Question: When a U.S. court says “in Brown” do they mean Brown vs. Bd. of Education or do they mean some other Brown case? Suggest some ways identify the “Brown” in question.

Please also consider contributing either funds or expertise to support this project.

LEGAL INFORMATION SYSTEMS & LEGAL INFORMATICS RESOURCES

Filed under: Legal Informatics — Patrick Durusau @ 5:58 am

LEGAL INFORMATION SYSTEM & LEGAL INFORMATICS RESOURCES is a one stop bibliography for all things related to legal informatics.

Just casting about you will find a list of blogs on legal informatics, data sets for testing your topic map authoring/mining tools, knowledge representation (ontologies) for legal materials, user studies about actual information systems and more.

it would be easy to loose an afternoon just exploring the listings of resources here, to say nothing of following each one of those listings.

Questions:

  1. Remember Blair and Maron and lawyers thinking they were getting 75% of the relevant resources (reality was 20%). What are the comparable results for legal materials?
  2. If you use one of the major legal research interfaces, Westlaw, Lexis/Nexis, what one topic map related feature would you add? Why?
  3. Is such a feature offered by any of the lesser legal research interfaces?

VoxPopuLII – Blog

Filed under: Cataloging,Classification,FRBR,Information Retrieval,Legal Informatics — Patrick Durusau @ 5:46 am

VoxPopuLII.

From the blog:

VoxPopuLII is a guest-blogging project sponsored by the Legal Information Institute at the Cornell Law School. It presents the insights of a the very diverse group of people working on legal informatics issues and government information, all around the world. It emphasizes new voices and big ideas.

Not your average blog.

I first encountered: LexML Brazil Project

Questions (about LexML):

  1. What do you think about the strategy to deal with semantic diversity? Pluses? Minuses?
  2. The project says they are following: “Ranganathan’s ‘stratification planes’ classification system…” Your evaluation?
  3. Identify 3 instances of equivalents to the “stratification planes” classification system.
  4. How would you map those 3 instances to Ranganathan’s “stratification planes?”

Ordinance Survey Linked Data

Filed under: Authoring Topic Maps,Mapping,Merging,Topic Maps — Patrick Durusau @ 5:40 am

Ordinance Survey Linked Data.

Description:

Ordnance Survey is Great Britain’s national mapping agency, providing the most accurate and up-to-date geographic data, relied on by government, business and individuals. OS OpenData is the opening up of Ordnance Survey data as part of the drive to increase innovation and support the “Making Public Data Public” initiative. As part of this initiative Ordnance Survey has published a number of its products as Linked Data. Linked Data is a growing part of the Web where data is published on the Web and then linked to other published data in much the same way that web pages are interlinked using hypertext. The term Linked Data is used to describe a method of exposing, sharing, and connecting data via URIs on the Web….

Let’s use topic maps to connect subjects that don’t have URIs.

Subject mapping exercise:

  1. Connect 5 subjects from the Domseday Book
  2. Connect 5 subjects from either The Shakespeare Paper Trail: The Early Years and/or The Shakespeare Paper Trail: The Later Years
  3. Connect 5 subjects from WW2 People’s War (you could do occurrences but try for something more imaginative)
  4. Connect 5 subjects from some other period of English history.
  5. Suggest other linked data sources and sources of subjects for subject mapping (extra credit)

TMQL Notes from Leipzig

Filed under: Information Retrieval,TMQL,Topic Maps — Patrick Durusau @ 4:48 am

TMQL language proposal – apart from Path Language have been posted to the SC 34 document repository for your review and comments!

Deeply appreciate Lars Marius Garshol leading the discussion.

Now is the time for your comments and suggestions.

Even better, trial implementations of present and requested features.

One of the best ways to argue for a feature is to show it in working code.

Or even better, when applied to show results not otherwise available.

Semantic Web Summit East – November 16-17, 2010 Boston

Filed under: Conferences,Semantic Web,Semantics — Patrick Durusau @ 4:19 am

Semantic Web Summit East – November 16-17, 2010.

The range of “semantic” for this conference is broader than “Semantic Web.” Check the presentations to see what I mean.

Useful for the business case about semantics, contacts and semantic success stories.

BTW, June 5-9 is the Semantic Web Summit West, San Francisco.

October 28, 2010

4. “Everything is fragmented.”

Filed under: Authoring Topic Maps,Knowledge Management,Marketing,Topic Maps — Patrick Durusau @ 6:12 am

Knowledge Management Principle Four of Seven (Rendering Knowledge by David Snowden)

Everything is fragmented. We evolved to handle unstructured fragmented fine granularity information objects, not highly structured documents. People will spend hours on the internet, or in casual conversation without any incentive or pressure. However creating and using structured documents requires considerably more effort and time. Our brains evolved to handle fragmented patterns not information.

I would rather say that complex structures exist just beyond the objects we handle in day to day conversation.

The structures are there, if and when we choose to look.

The problem Snowden has identified is that most systems can’t have structures “appear” when they “look” for them.

Either the objects fit into some structure or they don’t from the perspective of most systems.

Making those structures, that normally appear only when we look, explicit, is the issue.

Explicit or not, none of our objects have meaning in isolation from those structures.

To make it interesting, we all bring slightly different underlying structures to those objects.

(Making assumed or transparent structures explicit is hard. Witness the experience of markup.)

19th ACM International Conference on Information and Knowledge Management

Filed under: Conferences,Information Retrieval,Knowledge Management,Software — Patrick Durusau @ 5:50 am

The front matter for 19th ACM international conference on Information and knowledge management is a great argument for ACM membership + Digital Library.

There are 126 papers, any one of which would make for a pleasant afternoon.

I will be mining these for those particularly relevant to topic maps but your suggestions would be appreciated.

  1. What conferences do you follow?
  2. What journals do you follow?
  3. What blogs/websites do you follow?

*****
Visit the ACM main site or its membership page ACM Membership

Biostar

Filed under: Bioinformatics,Biomedical — Patrick Durusau @ 5:31 am

Biostar bills itself as “Questions and answers on bioinformatics, computational genomics and systems biology.”

Still building up but assuming the community gets behind the site it should be a “go to” place the areas it covers.

I mention it here so that:

  • topic mappers can recommend it
  • topic mappers can learn bioinformatics nomenclature

Gource

Filed under: Graphs,Software,Visualization — Patrick Durusau @ 5:25 am

Gource – Software Version Control Visualization.

Relevance to topic maps:

Consider the visualization of Blacklight, an open source library software project.

Imagine visualizing source code across an enterprise (are you listening MS/HP/Oracle/IBM?) so that code, coders, use of classes, can be compared.

Questions:

  1. Sign up with the computer lab to learn about version control.
  2. Use Gource to visualize an open source project’s versions.
  3. Subject Identity and Software (project)

LDSpider

Filed under: Linked Data,Search Engines,Searching,Semantic Web — Patrick Durusau @ 5:11 am

LDSpider.

From the website:

The LDSpider project aims to build a web crawling framework for the linked data web. Requirements and challenges for crawling the linked data web are different from regular web crawling, thus this projects offer a web crawler adapted to traverse and harvest sources and instances from the linked data web. We offer a single jar which can be easily integrated into own applications.

Features:

  • Content Handlers for different formats
  • Different crawling strategies
  • Crawling scope
  • Output formats

Content handlers, crawling strategies, crawling scope, output formats, all standard crawling features. Adapted to linked data formats but those formats should be accessible to any crawler.

A welcome addition since we are all going to encounter linked data but I am missing what is different?

If you see it, please post a comment.

Questions:

  1. What semantic requirements should a web crawler have?
  2. How does this web crawler compare to your requirements?
  3. What one capacity would you add to this crawler?
  4. What other web crawlers should be used for comparison?

October 27, 2010

Semi-Supervised Graph Embedding Scheme with Active Learning (SSGEAL): Classifying High Dimensional Biomedical Data

Filed under: Bioinformatics,Biomedical,Dimension Reduction — Patrick Durusau @ 4:15 pm

Semi-Supervised Graph Embedding Scheme with Active Learning (SSGEAL): Classifying High Dimensional Biomedical Data Authors: George Lee, Anant Madabhushi

Abstract:

In this paper, we present a new dimensionality reduction (DR) method (SSGEAL) which integrates Graph Embedding (GE) with semi-supervised and active learning to provide a low dimensional data representation that allows for better class separation. Unsupervised DR methods such as Principal Component Analysis and GE have previously been applied to the classification of high dimensional biomedical datasets (e.g. DNA microarrays and digitized histopathology) in the reduced dimensional space. However, these methods do not incorporate class label information, often leading to embeddings with significant overlap between the data classes. Semi-supervised dimensionality reduction (SSDR) methods have recently been proposed which utilize both labeled and unlabeled instances for learning the optimal low dimensional embedding. However, in several problems involving biomedical data, obtaining class labels may be difficult and/or expensive. SSGEAL utilizes labels from instances, identified as “hard to classify” by a support vector machine based active learning algorithm, to drive an updated SSDR scheme while reducing labeling cost. Real world biomedical data from 7 gene expression studies and 3900 digitized images of prostate cancer needle biopsies were used to show the superior performance of SSGEAL compared to both GE and SSAGE (a recently popular SSDR method) in terms of both the Silhouette Index (SI) (SI = 0.35 for GE, SI = 0.31 for SSAGE, and SI = 0.50 for SSGEAL) and the Area Under the Receiver Operating Characteristic Curve (AUC) for a Random Forest classifier (AUC = 0.85 for GE, AUC = 0.93 for SSAGE, AUC = 0.94 for SSGEAL).

Questions:

  1. Literature on information loss from dimension reduction?
  2. Active Learning assisting with topic maps authoring. Yes/no/maybe?
  3. Update the bibliography of one of the papers cited in this paper.

The UMLS Metathesaurus: representing different views of biomedical concepts

Filed under: Bioinformatics,Biomedical,TMDM,TMRM,Topic Maps,UMLS — Patrick Durusau @ 6:14 am

The UMLS Metathesaurus: representing different views of biomedical concepts

Abstract

The UMLS Metathesaurus is a compilation of names, relationships, and associated information from a variety of biomedical naming systems representing different views of biomedical practice or research. The Metathesaurus is organized by meaning, and the fundamental unit in the Metathesaurus is the concept. Differing names for a biomedical meaning are linked in a single Metathesaurus concept. Extensive additional information describing semantic characteristics, occurrence in machine-readable information sources, and how concepts co-occur in these sources is also provided, enabling a greater comprehension of the concept in its various contexts. The Metathesaurus is not a standardized vocabulary; it is a tool for maximizing the usefulness of existing vocabularies. It serves as a knowledge source for developers of biomedical information applications and as a powerful resource for biomedical information specialists.

Bull Med Libr Assoc. 1993 Apr;81(2):217-22.
Schuyler PL, Hole WT, Tuttle MS, Sherertz DD.
Medical Subject Headings Section, National Library of Medicine, Bethesda, MD 20894.

Questions:

  1. Did you notice the date on the citation?
  2. Map this article to the Topic Maps Data Model (3-5 pages, no citations)
  3. Where does the Topic Maps Data Model differ from this article? (3-5 pages, no citations)
  4. If concept = proxy, what concepts (subjects) don’t have proxies in the Metathesaurus?
  5. On what basis are “biomedical meanings” mapped to a single Metathesaurus “concept?” Describe in general but illustrate with at least five (5) examples

3. “In the context of real need few people will withhold their knowledge.”

Filed under: Authoring Topic Maps,Knowledge Management,Marketing,Topic Maps — Patrick Durusau @ 5:58 am

Knowledge Management Principle Three of Seven (Rendering Knowledge by David Snowden)

In the context of real need few people will withhold their knowledge. A genuine request for help is not often refused unless there is literally no time or a previous history of distrust. On the other hand ask people to codify all that they know in advance of a contextual enquiry and it will be refused (in practice its impossible anyway). Linking and connecting people is more important than storing their artifacts.

I guess the US intelligence community has a “previous history of distrust” and that is why some 9 years after 9/11 effective intelligence sharing remains a fantasy.

People withhold their knowledge for all sorts of reasons. Job security comes to mind. Closely related is self-importance. Followed closely by revelation of incompetence. General insecurity, and a host of others.

Technical issues did not create the need for semantic integration. Technical solutions will not, by themselves, result in semantic integration.

How does search behavior change as search becomes more difficult?

Filed under: Interface Research/Design,Search Interface,Searching — Patrick Durusau @ 4:38 am

How does search behavior change as search becomes more difficult? Authors: Anne Aula, Rehan M. Khan, Zhiwei Guan Keywords: behavioral signals, difficult search tasks, search engines, search strategies, web search

Abstract:

Search engines make it easy to check facts online, but finding some specific kinds of information sometimes proves to be difficult. We studied the behavioral signals that suggest that a user is having trouble in a search task. First, we ran a lab study with 23 users to gain a preliminary understanding on how users’ behavior changes when they struggle finding the information they’re looking for. The observations were then tested with 179 participants who all completed an average of 22.3 tasks from a pool of 100 tasks. The large-scale study provided quantitative support for our qualitative observations from the lab study. When having difficulty in finding information, users start to formulate more diverse queries, they use advanced operators more, and they spend a longer time on the search result page as compared to the successful tasks. The results complement the existing body of research focusing on successful search strategies.

Seeking clues to trigger the offering of help/suggestions when users are having difficulty with a search.

For topic maps, a similar line of research could be on what properties trigger recognition of particular subjects for a given audience.

  1. How would you design research to test what properties trigger subject recognition?
  2. How would the results of such research impact your design of a topic map interface?
  3. Would you offer/hide information based on self-identification of users? Why/why not?

October 26, 2010

2. “We only know what we know when we need to know it.”

Filed under: Authoring Topic Maps,Knowledge Management,Marketing,Topic Maps — Patrick Durusau @ 7:29 am

Knowledge Management Principle Two of Seven (Rendering Knowledge by David Snowden)

We only know what we know when we need to know it. Human knowledge is deeply contextual and requires stimulus for recall. Unlike computers we do not have a list-all function. Small verbal or nonverbal clues can provide those ah-ha moments when a memory or series of memories are suddenly recalled, in context to enable us to act. When we sleep on things we are engaged in a complex organic form of knowledge recall and creation; in contrast a computer would need to be rebooted.

An important principle both for authoring and creating useful topic maps.

A topic map for repairing a jet engine could well begin by filming the repair multiple times from different angles.

Then have a mechanic describe the process they followed without reference to the video.

The differences are things that need to be explored and captured for the map.

Likewise, a map should not stick too closely to the “bare” facts needed for the map.

People using the map will need context in order to make the best use of its information.

What seems trivial or irrelevant, may be the clue that triggers an appropriate response. Test with users!

*****

PS: Don’t forget that the context in which a topic map is *used* is also part of its context.

The Neighborhood Auditing Tool – Update

Filed under: Bioinformatics,Biomedical,Interface Research/Design,Ontology,SNOMED,UMLS — Patrick Durusau @ 7:22 am

The Neighborhood Auditing Tool for the UMLS and its Source Terminologies is a presentation mentioned here several days ago.

If you missed it, go to: http://bioontology.org/neighborhood-audiiting-tool for the slides and WEBEX recording.

Pay close attention to:

The clear emphasis on getting user feedback during the design of the auditing interface.

The “neighborhood” concept he introduces has direct application to XML editing.

Find the “right” way to present parent/child/sibling controls to users and you would have a killer XML application.

Questions:

  1. Slides 8 – 9. Other than saying this is an error (true enough), on what basis is that judgment made?
  2. Slides 18 – 20. Read the references (slide 20) on neighborhoods. Pick another domain, what aspects of neighborhoods are relevant? (3-5 pages, with citations)
  3. Slides 21 – 22. How do your neighborhood graphs compare to those here?
  4. Slides 23 – 46. Short summary of the features of NAT and no citation evaluation. Or, use NAT as basis for development of interface for another domain. (project)
  5. Slides 49 – 55. Visualizations for use and checking. Compare to current literature on visualization of vocabularies/ontologies. (project)
  6. Slides 56 – 58. Snomed browsing. Report on current status. (3-5 pages, citations)
  7. Slices 57 – 73. Work on neighborhoods and extents. To what extent is a “small intersection type” a sub-graph and research on sub-graphs applicable? Any number of issues and questions can be gleaned from this section. (project)

PSIs Going Viral?

Filed under: PSI,Subject Identifiers,Topic Map Software,Topic Maps — Patrick Durusau @ 6:51 am

Publishing subject identifiers with node makes me wonder if PSIs (Published Subject Identifiers) are about to go viral?

This server software, written in Javascript, is an early release and needs features and bug fixes (feel free to contribute comments/fixes).

As it matures we could see a proliferation of PSI servers.

Key to that is downloading, installing, breaking, complaining, return to downloading. 😉

As Graham Moore says on TopicMapMail: “This is very cool.”

Older Posts »

Powered by WordPress