It is also a good idea for a topic map authoring interface.
Say you get ten (10) “hits” back from a search. Add a “checkbox” to each “hit.” Unchecked means same as other unchecked “hits.” Checked means different from the unchecked “hits.”
The “same subject” judgment becomes a collective one of all the users of the search interface. Different “hits” are going to be unchecked in any search return.
A Treemap is a visualization of hierarchical data that uses squares to represent nodes in a tree. The size of a square depends upon a value assigned to it, based on some range of measurement. One drawback of this method is that complex or deep hierarchies are difficult to render for effective use.
The authors provide an excellent introduction to Treemaps, the current state of their use, as well as providing a method that allows the use of Treemaps visualizations with arbitrary shapes.
Computationally complex, Voronoi Treemaps may not be appropriate for real-time renderings of topic maps or domains for mapping.
The visualization of data domains as an aid to the creation of topic maps should include Voronoi Treemaps as part of its research agenda.
Is search passé? is an intriguing question asked at the Montangue Institute Review for August, 2010. Unfortunately, not being a member, I can’t summarize their answer for you.
It really isn’t that hard to guess some of them. I blogged about Blair and Maron saying twenty-five years ago:
Stated succinctly, it is impossibly difficult for users to predict the exact words, word combinations, and phrases that are used by all (or most) relevant documents and only (or primarily) by those documents, as can be seen in the following examples.
Documents and texts haven’t changed in the last twenty-five years. If anything, the problem has gotten worse due to the volume and variety of material that is now available for searching.
This is a semantic and therefore human judgment problem. Algorithms and “clever” data structures can assist human users in making those judgments, but can’t replace them in the loop.
Imagine a search engine that seeks the assistance of users on semantic issues. As opposed to the skulking around of current search engines and sites. Why not just ask? Politely.
A user-fed search engine with a topic map backend. That could be very interesting.
The article is not yet available on my university server but I will keep a watch for it and will report back when I have more details. The author links are to their DBLP records.
Try the following searches on “merging operators” in DBLP and CiteSeerX:
Possibilistic logic provides a good framework for dealing with merging problems when information is pervaded with uncertainty and inconsistency. Many merging operators in possibilistic logic have been proposed. However, there are still some important problems left unsolved.
Makes me curious about the “Many merging operators….” No promises of when but it would be interesting to start a list of those both within and without possibilistic logic.
KNIME (Konstanz Information Miner) is a user-friendly and comprehensive Open-Source data integration, processing, analysis, and exploration platform. From day one, KNIME has been developed using rigorous software engineering practices and is currently being used actively by over 6.000 professionals all over the world, both in industry and academia.
Read the KNIME features page for a very long list of potentially useful subject identity tests.
There is a place for string matching IRIs, but there is a world of subject identity beyond that as well.
This paper proposes a new approach to support creativity through assisting the discovery of unexpected associations across different domains. This is achieved by integrating information from heterogeneous domains into a single network, enabling the interactive discovery of links across the corresponding information resources. We discuss three different pattern of domain crossing associations in this context.
Does that sound familiar to anyone?
Part of the continuing irony that semantic integration research suffers from a lack of semantic integration.
I am just at the tip of this particular iceberg of research so please chime in with pointers to conferences, proceedings, articles, books, etc.
Study of several episodes of mass collaboration in China by users that involved collecting and sharing information.
Has to make you wonder about using human communities (as opposed to “experts”) to identify subjects and build topic maps doesn’t it?
To make community maps explicit is probably the more accurate turn of phrase. Communities already identify subjects of interest to them, just not in the same language as an “expert.”
Identifying subjects in human community languages (as opposed to “expert” languages) won’t enable software agents to “reason” about the temperature of drinks in a soft drink machine.
But I know where to hire some bright engineers if I need that sort of information over a web interface.
The main idea is to have an object representation of a topic map in any programming language that supports JSON without writing or generating mapping code and still being able to access the information with little to no knowledge of Topic Maps. TM/JSON first draft
Code written with no understanding of the inputs seems problematic to me. (The mother’s programming job in Snow Crash?)
TM/JSON does not appear to require ignorance of topic maps so perhaps programmers knowledgeable about topic maps will find it useful as well.
We all need to give it a close read and Robert the benefit of some feedback.
Short Title
section 1. This Act may be cited as the “______Act of____”.
Ask yourself: How would topic maps lead to a different result? (Ok, that probably wasn’t your first thought, work with me here.)
If bills were treated as subjects, represented by topics, using TMCL, we can specify that every topic of type “House Bill” has to have one and only one name.
houseBill isa tmcl:topic-type;
has-name(tmdm:topic-name, 1, 1).
Which says every topic of House Bill type has one and only one name. And we should get an error warning if is it missing.
If that seems like a lot of trouble fix a work flow proofing glitch, consider this:
U.S. legislation typically runs hundreds, even thousands of pages with provisions that are relevant to particular constituencies. What if all those provisions and their constituencies were treated as subjects, represented by topics?
Everyone could read those provisions of interest to them or the ones they were interested in opposing (possibly the more popular of the two). Instead of 2,000 pages you might need to read only 3 to 5 pages.
Reading maybe 3 to 5 pages sounds more like transparency to me than dumping 2,000+ pages on my desk and calling it “transparency.”
******
PS: My suggestion to fix the bill title: “Last Opaque Act of 2010.” Whether lobbyists, elected officials and agencies can hear it or not, transparency is coming, to the USA.
I must admit to some disappointment when I found it was collecting index columns and placing them together in a single table. I am sure that technique is quite valuable for data warehouses but isn’t what I think of when I use the phrase, “merging indexes.”
The article is well written and was worth reading. As I started to put it to one side, it occurred to me that perhaps I was too hasty in deciding it wasn’t relevant to topic maps.
What if I had a data warehouse with a “merged” index where collectively the columns supported queries based on subject identity? Or if I wanted to use a set of indexes from other applications (say Lucene for example), to query against for similar purposes?
Whether you are into .Net or not, you should add this one to your reading list.
When you are considering whether a map is a territory, consider the ways in which maps are treated like territories.
Maps are defended like territories. Suggest to one upper ontology that it should consider being more like another upper ontology if you want to see that in real life.
Maps are seen as destinations/territories. Witness the “convert to the latest ….. data model” efforts. A data model is nothing but a map. Advocates of a data map/model will not rest until all data bows to their map/model. (Rest easy, it never happens.)
Maps are seen as destinations/territories (2). The constructs of a map can be seen as subjects in their own right (in addition to its contents). Those subjects are implicitly recognized in conversion. (Topic maps enable those subjects to be made explicit.)
Claim made by particular destinations/territories that: Existence of different designations/territories impede interchange, communication, and create unnecessary expense.
What other characteristics would you ascribe to territories that can also be said about maps?
(Your claim doesn’t need universal acclimation, but you should have a good argument for it.)
*****
I reformed “covert” to “convert,” based on comment from Kirk. Although, from what I read in the papers, “covert” might have been accurate as well!
Hayakawa’s dictum “…the map is NOT the territory it stands for.” (Language in Thought and Action, 1949) opens the question of what is the nature of its “NOT” being the territory it stands for?
That question has many aspects but the one for today is that the map “…stands for…” the territory. That implies that it points to or in some way represents the territory.
If a map points to a territory, can there be more than one map of the same territory?
Think of at least 2 examples of where there are different maps of physical territories.
How do those maps point to the territory in question?
How would you point to the maps in your example from another map?
Would that make the maps in your example into territories?
If not, why not?
Gary W. Strong and M. Carl Drott, contend in A Thesaurus for End-User Indexing and Retrieval, Information Processing & Management, Vol. 22, No. 6, pp. 487-492, 1986, that:
A low-cost, practical information retrieval system, if it were to be designed, would require a thesaurus, but one in which end-users would be able to browse research topics by means of an organization that is concept-based rather than term-based as is the typical thesaurus.
…. (while elsewhere)
It is our hypothesis that, when the thesaurus can be envisioned by users as a simple, yet meaningful, organization of concepts, the entire information system is much more likely to be useable in an efficient manner by novice users. (emphasis added)
It puzzles me that experts are building a system of concepts for novices to use. Do you suspect experts have different views of the domains in question than novices? And approach their search for information with different assumptions?
Any concept system designed by an expert is a prescriptive information retrieval system. It represents their view of the domain and not that of a novice. Or rather it represents how the expert thinks a novice should navigate the field.
While the expert’s view may be useful for some purposes, such as socializing a novice into a particular view of the domain, it may be more useful for novices to use a novice’s view of the domain. To build that we would need to turn to novices in a domain. Perhaps through the use of adaptive information retrieval, IR that adapts to its user, rather than the other way around.
Adaptive information retrieval systems, I like that, ones that grow to be more like their users and less like their builders with every use.
Oh, I guess I had better say what that means. Or, better yet, let that silver-tongued devil Lars Heuer say it for me:
It is a suite of tests for Topic Maps implementations, based around the various Topic Maps syntaxes. The intention is to help developers of Topic Maps implementations verify that their implementations are actually correct according to the specifications.
Each test consists of (at least) one input file with a corresponding CXTM file. If a Topic Maps implementation works correctly, it has to generate the same canonical output as specified by the reference CXTM file.
I added a web service to Mappify which translates different Topic Maps syntaxes to XTM 2.1, CTM 1.0 and JTM 1.0 (reasons for this limitation can be found in [1]).
As a community, not to pick on Lars, we need to find better titles for our papers/posts, etc. Take this post for example, why not: “Crossdressing Topic Maps, A Web Service.”? I think that would get a lot more hits than its present title.
The basic idea is that an organization should have one uniform way to talk about its non-transactional entities. In topic map land we would say subjects.
OK, but here’s come the payoff question: How does the organization deal with heterogeneous data from others?
Ah, yes, well, hmmm, …..that wasn’t part of our MDM contract.
You can be an island of pure data (ghetto?) in a heterogeneous world (MDM) or you can play well with others (topic maps). Which do you think offers the most commercial advantage?
JTC 1/SC 34 WG 3 (ok, Topic Maps) working group will be meeting two days before TMRA starts in Liepzig, Germany! That is 27-28 September 2010. (Location details forthcoming.)
The Topic Maps Data Model (TMDM) first appeared in the SC 34 document registry on 11 August 2001. (That’s SC 34 N0242 for ISO insiders.)
What better way to celebrate its “birthday” than a two day, 2 hours per day, series of presentations on what we have learned in the past ten years and where we would like to go?
I am proposing teleconferences on the 11th and 12th of August, 2011, say from 10 AM UTC/GMT (12 PM Norway, 7 PM Japan, 6 AM Eastern US) until 12 PM UTC/GMT.
General format being 20 minute presentations with 10 minutes Q/A. That should accommodate a maximum of 4 presentations each day.
Comments/suggestions? Volunteers to make presentations?
Our starting premise: Users want to say things of interest to them, as simply as possible, for them.
Note the focus on users. Not on description logic. Not on formal ontologies. Not on reasoning, artificial or otherwise. Not even on complex mappings between identifications. But on users.
All of those other things are worthwhile enterprises, some of them anyway, which you can pursue your own leisure.
The question is how to empower users to say things about what interests them? And if possible, how to do so without re-writing the WWW to deal with 303 clouds, etc. ?
Our answer to those questions: PGS – Pretty Good Semantics. It asks very little of users yet can annotate any identifier on the WWW to say whatever a user likes.
It uses existing HTML techniques and works with existing web servers and search engines.