Archive for July, 2010

Interface Suggestion for Topic Maps

Saturday, July 31st, 2010

New York Times stories have a feature that could be an interesting presentation option for topic maps.

I can highlight arbitrary text and a question icon appears above it. Select the icon and more information appears about the highlighted text.

I like it because:

  • it offers help, when I ask for it
  • it doesn’t require inline markup

Library use case: I am reading a journal article online and when I select the title of a reference, I should be able to see a link to that article and any related resources. (As opposed to being 3, maybe 4 screens away from the where I can see the cited article and then have to navigate back to the original article.)

As a user I won’t be exposed to topic/association/occurrence or proxies and legends. But if the point is to deliver the latest information across semantic barriers, that’s ok. Right?

******
PS: The New York Times has had this feature for some time. I hadn’t seen it because I usually read the New York Times on the weekends and in hard copy.

The Syntactic Web: Syntax and Semantics on the Web

Friday, July 30th, 2010

The Syntactic Web: Syntax and Semantics on the Web by Jonathan Robie describes the use of XQuery to query both RDF and topic maps.

I ran across it while I was getting a Markup Language journal set ready for auction at Balisage.

Given Jonathan’s depth of experience with query languages, something to help decide what the community wants from TMQL.

Neo4j 1.1 Released!

Friday, July 30th, 2010

Neo4j 1.1 has arrived!

From Peter Neubauer’s blog entry:

The Neo4j graph database release 1.1 has just arrived, so here’s some information on the new things that have been included. The main points are the additions of monitoring support, an event framework and a new traversal framework to the kernel. Then two useful components have been added to the default distribution (called “Apoc”): graph algorithms and online backup.

Peter’s post has pointers to other Neo4j resources.

TMRA 2010 Registration Is Open!

Friday, July 30th, 2010

TMRA (Topic Maps Research and Applications) 2010 registration is open!

Early bird registration until August 10, 2010! (save 30 EUR!)

As a reviewer I can’t name names but the agenda is strong and practical.

Watch this space for an ISO meeting announcement in connection with TMRA!

******
If you need more reasons to attend, the Central Station has great smoked chicken and other tasty delights!

Topic Maps Data Model (TMDM) in a nutshell

Thursday, July 29th, 2010

Topic Maps Data Model (TMDM) in a nutshell by Marcel Hoyer is a handy graphic representation of all the relationships in the TMDM.

Complete TMQL Tutorial!

Thursday, July 29th, 2010

Complete TMQL Tutorial!

A second plug for the TMQL tutorials from the Topic Maps Lab

Work through the tutorials and discuss what you like/don’t like on one of the topic map mailing lists!

Get in shape for the TMQL discussions at TMRA!

If you don’t speak up, others will have no opinions but their own about what the topic map community wants.

The more opinions we have, the richer the result for the community.

******
PS: Please send feedback to Sven Krosse and favorable feedback to his director, Dr. Lutz Maicher.

😉

Django and Neo4j – Domain modeling that kicks ass – Post

Wednesday, July 28th, 2010

Django and Neo4j – Domain modeling that kicks ass.

Derek Stainer covers some licensing and performance numbers for Neo4J before turning it over to a presentation by Tobias Ivarsson.

High marks as a great introduction to Neo4J!

Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources (1997)

Wednesday, July 28th, 2010

Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources (1997) by Mary Tork Roth isn’t the latest word on wrappers but is well written. (longer version, A Wrapper Architecture for Legacy Data Sources (1997) )

The wrapper idea is a good one, although Roth uses it in the context of a unified schema, which is then queried. With a topic map, you could query on the basis of any of the underlying schemas and get the data from all the underlying data sources.

That result is possible because a topic map has one representative for a subject and can have any number of sources for information about that single subject.

I haven’t done a user survey but suspect most users would prefer to search for/access data using familiar schemas rather than new “unified” schemas.

Nexxor offers fixed-price Topic Maps startup package – Post

Wednesday, July 28th, 2010

Topic Map Snippets posted Nexxor offers fixed-price Topic Maps startup package. Startup packages and specials sound like good signs for topic maps!

Federation and Business Intelligence Applications

Tuesday, July 27th, 2010

Federated Stream Processing Support for Real-Time Business Intelligence Applications by Irina Botan, Younggoo Cho, Roozbeh Derakhshan, Nihal Dindar, Laura Haas, Kihong Kim, and Nesime Tatbul, argues realtime BI has two critical requirements:

  1. reducing latency
  2. providing rich contextual data that is directly actionable

Topic maps enable you to (reliably) endow subjects in streams with rich contextual data that is directly actionable — across streams. (Not to mention that it will remain re-usable when your current IT department turns over.)

Selling topic maps means casting them in terms of fixing issues of interest to customers.

I think this is another opportunity that awaits some clever topic map company.

From Moby-Dick To Mashups: Thinking About Bibliographic Networks

Monday, July 26th, 2010

From Moby-Dick To Mashups: Thinking About Bibliographic Networks was reported by the The FRBR Blog with the following summary:

Summary: Traditional and contemporary attempts to identify and describe simple and complex bibliographic resources have overlooked useful and powerful possibilities, due to the insufficient modeling of “bibliographic things of interest.” The presentation will introduce a resource description approach that remodels and strengthens FRBR by borrowing key concepts from Information Science and the History of Science. The presentation will reveal portions of a network of bibliographic (and other useful) relationships between printings of Melville?s novel dating from 1851-1975 into the present. In addition, structural similarities between the print publication network and the multimedia “mash-ups” seen on YouTube and other websites will be demonstrated and discussed.

Anyone creating a topic map for library resources needs to review these slides.

When Federated Search Bites (Jeff Jonas)

Monday, July 26th, 2010

When Federated Search Bites by Jeff Jonas is a bit of a rant but makes a number of telling points.

I think topic maps qualify as federated fetch to use Jonas’ terminology.

Not surprising since I think of topic maps as navigational overlays (where navigation includes subject sameness) and not as a data storage format.

But there is a lot of interest topic map software that stores data locally.

Both approaches work and have different advantages. Has anyone outlined how you would choose between those two approaches?

OneSource

Monday, July 26th, 2010

OneSource describes itself as:

OneSource is an evolving data analysis and exploration tool used internally by the USAF Air Force Command and Control Integration Center (AFC2IC) Vocabulary Services Team, and provided at no additional cost to the greater Department of Defense (DoD) community. It empowers its users with a consistent view of syntactical, lexical, and semantic data vocabularies through a community-driven web environment, directly supporting the DoD Net-Centric Data Strategy of visible, understandable, and accessible data assets.

Video guides to the site:

OneSource includes 158 vocabularies of interest to the greater U.S. Department of Defense (DoD) community. (My first post to answer Lars Heuer’s question “…where is the money?”)

Following posts will explore OneSource and what we can learn from each other.

Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Recommendation

Monday, July 26th, 2010

Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Recommendation is a must see!

Set Theory Symbols, if you need help with the symbols.

Looking forward to seeing topic map operations illustrated on a graph with the Gremlin character. (see the slides)

By Marko A. Rodriguez.


Edited to add author’s name. Post did not appear in a search as expected.

Lost In Translation – Article

Sunday, July 25th, 2010

Lost In Translation is a summary of recent research on language and its impact on our thinking by Lera Boroditsky (Professor of psychology at Stanford University and editor in chief of Frontiers in Cultural Psychology).

Read the article for the details but concepts such as causality, space and others aren’t as fixed as you may have thought.

Another teaser:

It turns out that if you change how people talk, that changes how they think. If people learn another language, they inadvertently also learn a new way of looking at the world. When bilingual people switch from one language to another, they start thinking differently, too.

Topic maps show different ways to identify the same subject. Put enough alternative identifications together and you will learn to think in another language.

Question: Should topic maps come with the following warning?

Caution: Topic Map – You May Start Thinking Differently

Dependency and Subject Identity

Sunday, July 25th, 2010

Dependence language model for information retrieval by Jianfeng Gao , Jian-yun Nie , Guangyuan Wu , and Guihong Cao, is a good introduction to dependency analysis in information retrieval.

The theory is that terms (words) in a document depend upon other words and that those dependencies can be used to improve the results of information retrieval efforts.

Beyond its own merits, I find the analogy of dependency analysis to subject identification interesting. That any subject identification depends upon other subjects being identified, whether those identifications are explicit or not.

If not explicit, we have the traditional IR problem of trying to determine what subjects were meant. We can see the patterns of usage but the reasons for the patterns lie just beyond our reach.

Dependency analysis does not seek an explicit identification but identifies patterns that appear to be associated with a particular term. That improves out “guesses” to a degree.

Topic maps enable us to make explicit what subjects the identification of a particular subject depends upon. Or rather to make explicit our identifications of subjects upon which an identification depends.

Whether the same subject is being identified, even by use of the same dependent identifications, is a question best answered by a user.

Dangers of Renaming

Saturday, July 24th, 2010

Topic maps and the semantic web share problems and dangers in their rush to re-name things with IRIs.

The problems include, the number of subjects, the propagation (enforcement?) of new names, the emergence of new subjects, and others.

Re-naming has a graver danger, identified by Michael Shara, curator of Astrophysics, American Museum of Natural History, when asked why the heaviest star in the universe, R136a**, doesn’t have a better name.* He responsed:

…partly because it [R136a] refers back to the original catalog, and once you go back to the original catalog, you can find all the literature that refers to it, so naming it John’s star or Betty’s Bright Object, would take that away from us.

So would renaming it to an IRI.

Request of the topic maps and semantic web communities:

Please let us keep our identifiers (as identifiers) and our history.

*****
*Weekend Edition – 24 July 2010 – Biggest Star Still Managed To Hide Until Just Now

**Astronomers find a 300 mass star (Royal Astronomical Society)

Topic Maps, Health Care and Interoperability

Friday, July 23rd, 2010

/making the ehealth> connection* by W. Ed Hammond, Ph.D., is a good summary of interoperability issues that health care IT solutions  must address.*

Interoperability issues in health care:

  • Semantic
  • Technical
  • Human/Computer
  • Communications
  • Functional
  • Data Transport
  • Decision Support Standards
  • EHR Functional Standards
  • Business
  • Security and Privacy
  • Legal, Ethical and Societal
  • Stakeholder
  • Environmental

Topic maps can address semantic interoperability but how does your application handle the other twelve (12) types of interoperability?

******

* I disagree with some of his comments on mapping solutions but I will save those for another post.

Queries and Linked Data

Thursday, July 22nd, 2010

Federated Data Management and Query Optimization for Linked Open Data by Olaf Görlitz and Steffen Staab and,

A Database Perspective on Consuming Linked Data on the Web by Olaf Hartig and Andreas Langegger,

are two recent publications on querying linked data that will repay close study as we prepare to discuss TMQL in Leipzig.

Linked data is a way of organizing subjects. A way that topic maps will encounter in the (still) heterogeneous world.

Introduction to Cassandra – Post

Thursday, July 22nd, 2010

Introduction to Cassandra showed up on myNoSQL today with a nice set of further reading links on Cassandra.

Would a listing of resources on graph query languages be helpful to anyone preparing to discuss TMQL in Leipzig?

Lily – the Scalable NoSQL Content Repository

Thursday, July 22nd, 2010

Lily – the Scalable NoSQL Content Repository

A product prior to customers. What a marketing concept!

Sarcasm to one side, this is a significant development for scalable content storage using NoSQL and for topic maps.

The more data stored in Lily the less findable it will be, particularly across vocabularies.

Traditional blind mappings will work but they will also remain impervious to reliable sharing/scaling.

Topic maps need not be embedded in data storage applications but that could be a key marketing point for some customers. Something to keep in mind while evaluating Lily.

Show Me The Money!

Wednesday, July 21st, 2010

Lars Heuer recently asked: Well, the whole world should use Topic Maps, but…. where is the money?

That a fair question.

To answer it I plan on blogging about an opportunity for the use of topic maps every week. Maybe a project, a software package, etc., but in all cases, an instance where topic maps would make a positive difference. Suggestions about opportunities that I should blog about are most welcome.

Watch this blog for my first “opportunity for topic maps” posting on 26 July 2010. The project in question is spending $millions on a non-topic map mapping solution and has been for years.

MARCXML to Topic Map – Sneak Preview

Wednesday, July 21st, 2010

Wandora – Sneak Preview offers support for converting MARCXML into a topic map. This link will go away when the official Wandora release supports this feature.

Aki KivelĂ€’s posted details at: [topicmapmail] MARCXML to Topic Maps implementation!

Aki also created an example if you don’t want to install Wandora to see this feature: Example MARCXML to topic map conversion.

As Aki would be the first to admit, this isn’t a finished solution. It is an important step on the way towards one possible solution.

Another important step is for members of this list t0 use, evaluate, test the software and give constructive feedback. Can be negative but try to offer a solution for any problem you uncover.

DARPA Funding for Topic Maps?

Tuesday, July 20th, 2010

Research Announcement DARPA?RA?10?76 July 2, 2010 seeks applications to the Computer Science Study Group.

Who is eligible?

An eligible participant must be a junior faculty member at a U.S. Institution of Higher Education. Participants should be no more than seven years beyond receiving a doctoral degree, pretenure junior faculty, with demonstrated exceptional potential for world?class contributions to the field of computer science. Each participant shall have intense research interest in a computer science topic of relevance to DoD and demonstrate novel ideas that lead to fundamental advances rather than incremental work in the field….

Topic maps fit the bill for being fundamental rather than incremental advance in the area of semantic integration.

Ask yourself: “Do I want to propose another ‘…teach the world (agency, government, etc.) to sing in perfect harmony‘ proposal, or do I want to submit something truly different? Something that makes sense out of a cacophony of data streams, while preserving the cacophony for later review?”

For what it’s worth, I don’t think terrorists will use vocabularies designed by intelligence agencies so they can “sing in perfect harmony.”

Pass this along to junior faculty members at U.S. Institutions of Higher Education and urge them to propose research based on topic maps.

******
This announcement is US-centric but I am more than happy to post notices of funding opportunities from other governments or organizations that may be of interest to topic map researchers.

Subjects, Sets and Identifications

Tuesday, July 20th, 2010

There was a lively discussion on the topicmapmail discussion list about books and whether they have any universal identifiers. (Look in the archives for July, 2010 and messages with MARC in the subject line.)

There are known problems with ISBNs, such as publishers re-using them or assigning duplicate ISBNs to different books or simply making mistakes with the numbers themselves.

It was reported by one participant that Amazon uses it own unique identifier for books.

The United States Library of Congress has its own internal identifier for books in its collection.

Not to mention that other library systems have their own identifiers for their collections.

At a minimum, it is possible for a book, considered as a subject, to have an ISBN, an identifier from Amazon, another identifier at the Library of Congress and still others in other systems. Perhaps even a unique identifier from a book jobber that sells books to libraries.

If you think about that for a moment, it become clear that a book as a subject has a *set* of identifiers, all of which identify the same subject. Moreover, each of those identifiers works best in a particular context, dare we say the identifier has a scope?

If I had a representative (a topic) for this subject (book) that had a set of identifiers (ISBN, ASIN, LOC, etc.) and each of those identifiers had a scope, I could reliably import information from any source that used at least one of those identifiers.

The originators of those identifiers can use continue to use their identifiers and yet enjoy the benefits of information that was generated or collected using other identifiers.

Topic maps anyone?

myNoSQL

Monday, July 19th, 2010

myNoSQL is maintained by Alex Popescu bills itself as

The Hello magazine of the NoSQL World

That may well be true. A wealth of useful resources and current news on NoSQL.

Top Secret America – Report

Monday, July 19th, 2010

Top Secret America (Part 1) by Dana Priest and William M. Arkin appeared in the Washington Post on Monday, 19 July 2010.

It’s early in the year for predictions but I think this is going to be my topic maps poster-child story for 2010.

I don’t doubt that with enough effort, a topic map could be perverted to reflect the lack of sharing and coordination that is reported in this story. But if the President were to assert real control, topic maps could be a part of the solution. (My suggestion would be no sharing = no paycheck/funding. These “patriots” won’t report for work without paychecks. “Pocketbook patriotism.”)

This story illustrates the need for topic maps in three ways:

First, they could help the Washington Post offer a drill down to the actual sources and public contract information that underlies their story. Not to mention knowing which representatives got donations from the same contractors who now have contracts for national security? Can you say “merging?”

Second, rather obviously topic maps could help eliminate the extreme duplication of information flow, which would allow analysts to concentrated on less, but higher quality information. And by eliminating the duplicate information flow, that should also trim down the middle and upper level management staffs, which would increase the amount of funding that could be spend on effective intelligence activities.

Third, and perhaps less obviously, intelligence operations of other governments and governments in waiting should take a lesson in how to not run an effective intelligence operation. If you don’t have $Billions to waste on duplicated and fragmented intelligence operations, perhaps you should consider the advantages that topic maps can bring to an intelligence operation.

Those advantages vary depending on what you want but typically it would result in elimination of duplication of content, enhanced sharing between intelligence agencies, tracking of information flow, integration of data from outside sources as well as offering multiple views of the data or multi-lingual presentation.

Those advantages are not automatic. No IT system, not even topic maps, can solve personnel management issues, greed, corruption, inter-agency rivalry, sheer stupidity, etc., but assuming you can manage those, topic maps can help make intelligence operations more effective.

This Means This, This Means That: A User’s Guide to Semiotics

Sunday, July 18th, 2010

This Means This, This Means That: A User’s Guide to Semiotics was “recommended” to me by Amazon.

From the product description:

Divided into 75 key semiotic concepts, each section of the book begins with a single image or sign, accompanied by a question that invites us to interpret what we are seeing. Turning the page, we can compare our response with the theory behind the sign. In this way, we actively engage in creative thinking. Read straight through or dipped into regularly, this book provides practical examples of how meaning is made in contemporary culture.

I probably have better stuff on Semiotics on my bookshelf but what interests me is the approach taken to explaining the concepts.

I don’t have a copy (yet) but would like to hear from anyone who has used it in an classroom setting.

Wondering if some thing similar would prove useful as an introduction to subject analysis in general or for some area in particular?

Perhaps showing documented cases where mistakes in subject identity lead to spectacular outcomes?

A “cost” of mis-interpretation to hook users into thinking about subject identity before they get to the hard part.

Graph Traversal Programming Pattern (Part 1) – Graph Structures – Post

Sunday, July 18th, 2010

Graph Traversal Programming Pattern (Part 1) – Graph Structures, by Derek Stainer is the start of a discussion of the Graph Traversal Programming Pattern presentation by Marko A. Rodriguez.

Starts with an primer on graphs. Next installment is on graph databases. Worth following.

You may also like the explanation of property graphs that is part of the Gremlin documentation.

Learning from the Web – Article

Saturday, July 17th, 2010

Learning from the Web will be five (5) years old this coming December.

Alan Bosworth (then VP of Engineering at Google) outlines eight (8) lessons from the Web.

In brief:

  1. Simple, relaxed, sloppily extensible text formats and protocols often work better than complex and efficient binary ones.
  2. It is worth making things simple enough that one can harness Moore’s law in parallel.
  3. It is acceptable to be stale much of the time.
  4. The wisdom of crowds works amazingly well.
  5. People understand a graph composed of tree-like documents (HTML) related by links (URLs).
  6. Pay attention to physics.
  7. Be as loosely coupled as possible.
  8. KISS. Keep it (the design) simple and stupid.

You will need to read the article to get the full flavor of the lessons.

His comments on how databases have failed to heed almost all the lessons of the web is interesting in light of the recent surge of NoSQL projects.

After you read the article, ask yourself how topic maps has or has not heeded the lessons of the web? If you think not, what would it take for topic maps to heed the lessons of the web?