Archive for the ‘Topic Map Systems’ Category

Should Topic Maps Gossip?

Wednesday, March 18th, 2015

Efficient Reconciliation and Flow Control for Anti-Entropy Protocols byRobbert van Renesse, Dan Dumitriu, Valient Gough and Chris Thomas.

Abstract:

The paper shows that anti-entropy protocols can process only a limited rate of updates, and proposes and evaluates a new state reconciliation mechanism as well as a flow control scheme for anti-entropy protocols.

Excuse the title, I needed a catchier line than the title of the original paper!

This is the Scuttlebutt paper that underlies Cassandra.

Rather than an undefined notion of consistency, ask yourself how much consistency is required by an application?

I first saw this in a tweet by Jason Brown.

Topic Map Tool Chain

Tuesday, April 2nd, 2013

Belaboring the state of topic map tools won’t change this fact: It could use improvement.

Leaving the current state of topic map tools to one side, I have a suggestion about going forward.

What if we conceptualize topic map production as a tool chain?

A chain that can exist as separate components or with combinations of components.

Thinking like *nix tools, each one could be designed to do one task well.

The stages I see:

  1. Authoring
  2. Merging
  3. Conversion
  4. Query
  5. Display

The only odd looking stage is “conversion.”

By that I mean conversion from being held in a topic map data store or format to some other format for integration, query or display.

TaxMap, the oldest topic map on the WWW, is a conversion to HTML for delivery.

Converting a topic map into graph format enables the use of graph display or query mechanisms.

End-to-end solutions are possible but a tool chain perspective enables smaller projects with quicker returns.

Comments/Suggestions?

Scotty [Unrestricted Access to Your Topic Map]

Wednesday, November 14th, 2012

Scotty – We transfer what you can’t

From the website/features:

Scotty is a free opensource proxy software for bypassing filter and censorship systems. A free and unrestricted web is one of the most important values our society has. This software helps people who are victims of censorship of governments or private organizations.

  • lightweight & platform independent
  • open source and free
  • secure RSA based encryption
  • uses simple http (when you have internet access, scotty works)
  • tunneling through any firewalls, webwashers, web filter
  • no privacy issues: uses your own server
  • supports Google AppEngine

Secure communications is likely to be a part of any topic map “system.”

Having topic mapped data is interesting but not terribly useful unless others can reach it. Past censors and others. Even more useful if others can communicate to add data to the topic map.

Scotty enables both of those use cases free of censorship.

There are other such systems but this one caught my eye this morning. Other suggestions welcome.

I first saw this in a tweet by Johannes Schmidt.

Is it time to get rid of the Linux OS model in the cloud?

Sunday, January 22nd, 2012

Is it time to get rid of the Linux OS model in the cloud?

From the post:

You program in a dynamic language, that runs on a JVM, that runs on a OS designed 40 years ago for a completely different purpose, that runs on virtualized hardware. Does this make sense? We’ve talked about this idea before in Machine VM + Cloud API – Rewriting The Cloud From Scratch, where the vision is to treat cloud virtual hardware as a compiler target, and converting high-level language source code directly into kernels that run on it.

As new technologies evolve the friction created by our old tool chains and architecture models becomes ever more obvious. Take, for example, what a team at USCD is releasing: a phase-change memory prototype  – a solid state storage device that provides performance thousands of times faster than a conventional hard drive and up to seven times faster than current state-of-the-art solid-state drives (SSDs). However, PCM has access latencies several times slower than DRAM.

This technology has obvious mind blowing implications, but an interesting not so obvious implication is what it says about our current standard datacenter stack. Gary Athens has written an excellent article, Revamping storage performance, spelling it all out in more detail:

Computer scientists at UCSD argue that new technologies such as PCM will hardly be worth developing for storage systems unless the hidden bottlenecks and faulty optimizations inherent in storage systems are eliminated.

Moneta, bypasses a number of functions in the operating system (OS) that typically slow the flow of data to and from storage. These functions were developed years ago to organize data on disk and manage input and output (I/O). The overhead introduced by them was so overshadowed by the inherent latency in a rotating disk that they seemed not to matter much. But with new technologies such as PCM, which are expected to approach dynamic random-access memory (DRAM) in speed, the delays stand in the way of the technologies’ reaching their full potential. Linux, for example, takes 20,000 instructions to perform a simple I/O request.

By redesigning the Linux I/O stack and by optimizing the hardware/software interface, researchers were able to reduce storage latency by 60% and increase bandwidth as much as 18 times.

The I/O scheduler in Linux performs various functions, such as assuring fair access to resources. Moneta bypasses the scheduler entirely, reducing overhead. Further gains come from removing all locks from the low-level driver, which block parallelism, by substituting more efficient mechanisms that do not.

Moneta performs I/O benchmarks 9.5 times faster than a RAID array of conventional disks, 2.8 times faster than a RAID array of flash-based solid-state drives (SSDs), and 2.2 times faster than fusion-io’s high-end, flash-based SSD.

Read the rest of the post and then ask yourself what architecture do you envision for a topic map application?

What if rather that moving data from one data structure to another, that the data structure addressed is identified by the data? If you wish to “see” the data as a table, it reports is location by table/column/row. If you wish to “see” the data as a matrix, it reports its matrix position. If you wish to “see” the data as a linked list, it can report its value, plus those ahead and behind.

It isn’t that difficult to imagine that data reports its location on a graph as the result of an operation. Perhaps storing its graph location for every graphing operation that is “run” using that data point.

True enough we need to create topic maps that run on conventional hardware/software but that isn’t an excuse to ignore possible futures.

Reminds me of a “grook” that I read years ago: “You will conquer the present suspiciously fast – if you smell of the future and stink of the past.” (Piet Hein but I don’t remember which book.)

SpiderDuck: Twitter’s Real-time URL Fetcher

Friday, November 25th, 2011

SpiderDuck: Twitter’s Real-time URL Fetcher

A bit of a walk on the engineering side but in order to be relevant, topic maps do have to be written and topic map software implemented.

This a very interesting write-up of how Twitter relied mostly on open source tools to create a system that could be very relevant to topic map implementations.

For example, the fetch/no-fetch decision for URLs is based on a comparison to URLs fetched within X days. Hmmm, comparison of URLs, oh, those things that occur in subjectIdentifier and subjectLocator properties of topics. Do you smell relevance?

And there is harvesting of information from web pages, one assumes that could be done on “information items” from a topic map as well, except there it would be properties, etc. Even more relevance.

What parts of SpiderDuck do you find most relevant to a topic map implementation?

Lessons learned from bountying bugs

Monday, October 31st, 2011

Lessons learned from bountying bugs

From the post:

A bit over a week ago, I wrote here about the $1265 of Tarsnap bugs fixed as a result of the Tarsnap bug bounty which I’ve been running since April. I was impressed by the amount of traffic that post received — over 82000 hits so far — as well as the number of people who said they were considering following my example. For them and anyone else interested in “crowdsourcing” bug hunting, here’s some of the lessons I learned over the past few months.

I suppose next to writing documentation, debugging is the least attractive job of all.

Curious if anyone has used a bountying bugs approach with their topic maps?

Seems like it would be a useful technique, particularly with large topic maps spread over organizations for users to get some sort of reward for reporting errors.

I suppose it isn’t a really big step to giving at least some users the ability to suggest new topics or merges.

I think it is important to remember that our users are people first and users of topic maps and/or software second. We all like to get rewards, even small ones.

It occurs to me that if a topic map is only one resource of many, that a user could consult and the plan is to move users towards relying on the topic map, offering incentives for use of the topic map would be an extension of the reward idea.

Solr Spellchecker internals (now with tests!)

Friday, May 13th, 2011

Solr Spellchecker internals (now with tests!)

Emmanuel Espina says:

But today I’m going to talk about Solr SpellChecker. In contrast with from google, Solr spellcheker isn’t much more than a pattern similarity algorithm. You give it a word and it will find similar words. But what is interpreted as “similar” by Solr? The words are interpreted just as an array of characters, so, two words are similar if they have many coincidences in their character sequences. That may sound obvious, but in natural languages the bytes (letters) have little meaning. It is the entire word that has a meaning. So, Solr algorithms won’t even know that you are giving them words. Those byte sequences could be sequences of numbers, or sequences of colors. Solr will find the sequences of numbers that have small differences with the input, or the sequences of colors, etc. By the way, this is not the approach that Google follows. Google knows the frequent words, the frequent misspelled words, and the frequent way humans make mistakes. It is my intention to talk about these interesting topics in a next post, but now let’s study how solr spellchecker works in detail, and then make some tests.

Looks like a good series on the details of spellcheckers.

Useful if you want to incorporate spell-check in a topic map application.

And for a deep glimpse into how computers are different from us.

Ender’s Topic Map

Thursday, February 24th, 2011

Warning: Spoiler for Ender’s game by Orson Scott Card.*

After posting my comments on the Maiana interface, in my posting Maiana February Release, I fully intended to post a suggested alternative interface.

But, comparing end results to end results isn’t going to get us much further than: “I like mine better than yours,” sort of reasoning.

It has been my experience in the topic maps community that isn’t terribly helpful or productive.

I want to use Ender’s Game to explore criteria for a successful topic map interface.

I think discussing principles of interfaces, which could be expressed any number of ways, is a useful step before simply dashing off interfaces.

Have all the children or anyone who needs to read Ender’s Game left at this point?

Good.

I certainly wasn’t a child or even young adult when I first read Ender’s Game but it was a deeply impressive piece of work.

Last warning: Spoiler immediately follows!

As you may have surmised by this point, the main character in the story is name Ender. No real surprise there.

The plot line is a familiar one, Earth is threatened by evil aliens (are there any other kind?) and is fighting a desperate war to avoid utter destruction.

Ender is selected for training at Battle School as are a number of other, very bright children. A succession of extreme situations follow, all of which Ender eventually wins, due in part to his tactical genius.

What is unknown to the reader and to Ender until after the final battle, Ender’s skills and tactics have been simultaneously used as tactics in real space battles.

Ender has been used to exterminate the alien race.

That’s what I would call a successful interface on a number of levels.

Ender’s environment wasn’t designed (at least from his view) as an actual war command center.

That is to say that it didn’t have gauges, switches, tactical displays, etc. Or at least the same information was being given to Ender, in analogous forms.

Forms that a child could understand.

First principle for topic map interfaces: Information must be delivered in a form the user will understand.

You or I may be comfortable with all the topic map machinery talk-talk but I suspect that most users aren’t.

Here’s a test of that suspicion. Go up to anyone outside of your IT department and ask the to explain how FaceBook works. Just in general terms, not the details. I’ll wait. 😉

OK, now are you satisfied that most users aren’t likely to be comfortable with topic map machinery talk-talk?

Second principle for topic map interfaces: Do not present information to all users the same way.

The military types and Ender were presented the same information in completely different ways.

Now, you may object that is just a story but I suggest that you turn on the evening news and listen to 30 minutes of Fox News and then 30 minutes of National Public Radio (A US specific example but I don’t know the nut case media in Europe.).

Same stories, one assumes the same basic facts, but you would think one or both of them had over heard an emu speaking in whispered Urdu in a crowed bus terminal.

It isn’t enough to simply avoid topic map lingo but a successful topic map interface will be designed for particular user communities.

In that regard, I think we have been mis-lead by the success or at least non-failure of interfaces for word processors, spreadsheets, etc.

The range of those applications is so limited and the utility of them for narrow purposes is so great, that they have succeeded in spite of their poor design.

So, at this point I have two principles for topic map interface design:

  • Information must be delivered in a form the user will understand.
  • Do not present information to all users the same way.

I know, Benjamin Bock, among others, is going to say this is all too theoretical, blah, blah.

Well, it is theoretical but then so is math but banking, which is fairly “practical,” would break down without math.

😉

Actually I have an idea for an interface design that at least touches on these two principles for a topic map interface.

Set your watches for 12:00 (Eastern Time US) 28 February 2010 for a mockup of such an interface.

*****
*(Wikipedia has a somewhat longer summary, Ender’s Game.)

PS: More posts on principles of topic map interfaces to follow. Along with more mockups, etc. of interfaces.

How useful any of the mockups prove to be, I leave to your judgment.

Programming in Topincs – Post

Wednesday, February 16th, 2011

Programming in Topincs

The main goal of programming in Topincs is to provide services that aggregate, transform, and manipulate the data in a Topincs store.

Robert Cerny walks through an invoicing system for a software company using the Topincs store.

Very good introduction to Topincs!

Thanks Robert!

About Version Vectors (a.k.a. Vector Clocks)

Tuesday, February 1st, 2011

About Version Vectors (a.k.a. Vector Clocks) by Kresten Krab Thorup.

Using spreadsheets as an example, Kresten explains how version vectors can solve a large class of versioning issues but not all.

Assuming you are interested in distributed topic map systems, versioning that leads to acceptable (not perfect) results will interest to you.

This is going to become more important as topic maps develop into distributed systems.

How to Design Programs

Thursday, December 30th, 2010

How to Design Programs: An Introduction to Computing and Programming Authors: Matthias Felleisen, Robert Bruce Findler, Matthew Flatt, Shriram Krishnamurthi (2003 version)

Update: see How to Design Programs, Second Edition.

Website includes the complete text.

The Amazon product description reads:

This introduction to programming places computer science in the core of a liberal arts education. Unlike other introductory books, it focuses on the program design process. This approach fosters a variety of skills–critical reading, analytical thinking, creative synthesis, and attention to detail–that are important for everyone, not just future computer programmers. The book exposes readers to two fundamentally new ideas. First, it presents program design guidelines that show the reader how to analyze a problem statement; how to formulate concise goals; how to make up examples; how to develop an outline of the solution, based on the analysis; how to finish the program; and how to test. Each step produces a well-defined intermediate product. Second, the book comes with a novel programming environment, the first one explicitly designed for beginners. The environment grows with the readers as they master the material in the book until it supports a full-fledged language for the whole spectrum of programming tasks. All the book’s support materials are available for free on the Web. The Web site includes the environment, teacher guides, exercises for all levels, solutions, and additional projects.

If we are going to get around to solving the hard subject identity problems in addition to those that are computationally convenient, there will need to be more collaboration across the liberal arts.

The Amazon page, How to Design Programs is in error. I checked the ISBN numbers at: http://www.bookhttp://www.books-by-isbn.com/s-by-isbn.com/ The ISBN-13 works but the French, German and UK details point back to the 2001 printing. Bottom line: There is no 2008 edition of this work.

If you are interested, Matthias Felleisen, along with Robert Bruce Findler and Matthew Flatt, has authored Semantics Engineering with PLT Redex in 2009. Sounds interesting but the only review I saw was on Amazon.

Semantically Equivalent Facets

Friday, December 10th, 2010

I failed to mention semantically equivalent facets in either Identifying Subjects With Facets or Facets and “Undoable” Merges.

Sorry! I assumed it was too obvious to mention.

That is if you are using a facet based navigation with a topic map, it will return/navigate the facet you ask for, and also return/navigate any semantically equivalent facet.

One of the advantages of using a topic map to underlie a facet system is that users get the benefit of something familiar, a set of facet axes they recognize, while at the same time getting the benefit of navigating semantically equivalent facets without knowing about it.

I suppose I should say that declared semantically equivalent facets are included in navigation.

Declared semantic equivalence doesn’t just happen, nor is it free.

Keeping that in mind will help you ask questions when sales or project proposals gloss over the hard questions of what return you will derive from an investment in semantic technologies? And when?

Facets and “Undoable” Merges

Friday, December 10th, 2010

After writing Identifying Subjects with Facets, I started thinking about the merge of the subjects matching a set of facets. So the user could observe all the associations where the members of that subject participated.

If merger is a matter of presentation to the user, then the user should be able to remove one of the members that makes up a subject from the merge. Which results in the removal of associations where that member of the subject participated.

No more or less difficult than the inclusion/exclusion based on the facets, except this time it involves removal on the basis of roles in associations. That is the playing of a role, being a role, etc. are treated as facets of a subject.

Well, except that an individual member of a collective subject is being manipulated.

This capability would enable a user to manipulate what members of a subject are represented in a merge. Not to mention being able to unravel a merge one member of a subject at a time.

An effective visual representation of such a capability could be quite stunning.

Identifying Subjects With Facets

Friday, December 10th, 2010

If facets are aspects of subjects, then for every group of facets, I am identifying the subject that has those facets.

If I have the facets, height, weight, sex, age, street address, city, state, country, email address, then at the outset, my subject is the subject that has all those characteristics, with whatever value.

We could call that subject: people.

Not the way I usually think about it but follow the thought out a bit further.

For each facet where I specify a value, the subject identified by the resulting value set is both different from the starting subject and, more importantly, has a smaller set of members in the data set.

Members that make up the collective that is the subject we have identified.

Assume we have narrowed the set of people down to a group subject that has ten members.

Then, we select merge from our application and it merges these ten members.

Sounds damned odd, to merge what we know are different subjects?

What if by merging those different members we can now find these different individuals have a parent association with the same children?

Or have a contact relationship with a phone number associated with an individual or group of interest?

Robust topic map applications will offer users the ability to navigate and explore subject identities.

Subject identities that may not always be the ones you expect.

We don’t live in a canned world. Does your semantic software?

Kafka : A high-throughput distributed messaging system – Post

Saturday, December 4th, 2010

Kafka : A high-throughput distributed messaging system

Caught my eye:

Kafka is a distributed publish-subscribe messaging system. It is designed to support the following

  • Persistent messaging with O(1) disk structures that provide constant time performance even with many TB of stored messages.
  • High-throughput: even with very modest hardware Kafka can support hundreds of thousands of messages per second.
  • Explicit support for partitioning messages over Kafka servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics.
  • Support for parallel data load into Hadoop.

Depending on your message passing requirements for your topic map application, this could be of interest. Better to concentrate on the semantic heavy lifting than re-inventing message passing when solutions like this exist.