## Archive for the ‘Topic Map Software’ Category

### Topic Map Tool Chain

Tuesday, April 2nd, 2013

Belaboring the state of topic map tools won’t change this fact: It could use improvement.

Leaving the current state of topic map tools to one side, I have a suggestion about going forward.

What if we conceptualize topic map production as a tool chain?

A chain that can exist as separate components or with combinations of components.

Thinking like *nix tools, each one could be designed to do one task well.

The stages I see:

1. Authoring
2. Merging
3. Conversion
4. Query
5. Display

The only odd looking stage is “conversion.”

By that I mean conversion from being held in a topic map data store or format to some other format for integration, query or display.

TaxMap, the oldest topic map on the WWW, is a conversion to HTML for delivery.

Converting a topic map into graph format enables the use of graph display or query mechanisms.

End-to-end solutions are possible but a tool chain perspective enables smaller projects with quicker returns.

### Current Topic Map Software?

Monday, April 1st, 2013

First, it took me a long time to understand what tools are out there, what their capabilities are, and which ones are still maintained. (As an aside, you would think the topic map community would have a central topic map based repository/wiki to make it easy for new developers to get started. )

A valid criticism.

I could not name off hand all the currently maintained topic map projects.

Can you?

Moreover, shouldn’t there be more topic map tools?

Adoption of computer technologies in the absence of computer-based tools tends to be low.

Yes?

### Critical Ruby On Rails Issue Threatens 240,000 Websites [Ruby TMs Beware]

Friday, January 11th, 2013

Critical Ruby On Rails Issue Threatens 240,000 Websites by Mathew J. Schwartz.

From the post:

All versions of the open source Ruby on Rails Web application framework released in the past six years have a critical vulnerability that an attacker could exploit to execute arbitrary code, steal information from databases and crash servers. As a result, all Ruby users should immediately upgrade to a newly released, patched version of the software.

That warning was sounded Tuesday in a Google Groups post made by Aaron Patterson, a key Ruby programmer. “Due to the critical nature of this vulnerability, and the fact that portions of it have been disclosed publicly, all users running an affected release should either upgrade or use one of the work arounds immediately,” he wrote. The patched versions of Ruby on Rails (RoR) are 3.2.11, 3.1.10, 3.0.19 and 2.3.15.

As a result, more than 240,000 websites that use Ruby on Rails Web applications are at risk of being exploited by attackers. High-profile websites that employ the software include Basecamp, Github, Hulu, Pitchfork, Scribd and Twitter.

Ruby developers will already be aware of this issue but if you have Ruby-based topic map software, you may not have an in-house Ruby developer.

The major players in the Ruby community are concerned so it’s time to ask someone to look at any Ruby software, topic maps or not, that you are running.

If you are interested in the details, see: Analysis of Rails XML Parameter Parsing Vulnerability.

At its heart, a subject identity issue.

If symbol and yaml types had defined properties/values (or value ranges) as part of their “identity,” then other routines could reject instances that do not meet a “safe” identity test.

But because instances are treated as having primitive identities, what gets injected is what you get (WGIIWY).

### Topincs 6.4.0

Wednesday, November 21st, 2012

Topincs 6.4.0 by Robert Cerny.

Robert details the new features and enhancements to Topincs.

### The Impedance Mismatch is Our Fault

Friday, November 2nd, 2012

The Impedance Mismatch is Our Fault by Stuart Halloway.

From the summary:

Stuart Dabbs Halloway explains what the impedance mismatch is and what can be done to solve it in the context of RDBMS, OOP, and NoSQL.

If you haven’t seen one of Stuart’s presentations, you need to treat yourself to this one.

Two points, among many others, to consider:

In “reality,”

• Records are immutable.
• Reality is cumulative.

How does your topic map application compare on those two points?

### “IBM® Compatible” On The Outside?

Thursday, October 11th, 2012

I ran across Customizing and Extending IBM Content Navigator today.

Abstract:

IBM® Content Navigator is a ready-to-use, modern, standards-based user interface that supports Enterprise Content Management (ECM) use cases, including collaborative document management, production imaging, and report management. It is also a flexible and powerful user platform for building custom ECM applications using open web-based standards.

This IBM Redbooks® publication has an overview of the functions and features that IBM Content Navigator offers, and describes how you can configure and customize the user interface with the administration tools that are provided. This book also describes the extension points and customization options of IBM Content Navigator and how you can customize and extend it with sample code. Specifically, the book shows you how to set up a development environment, and develop plug-ins that add new actions and provide special production imaging layout to the user interface. Other customization topics include working with external data services, using IBM Content Navigator widgets externally in other applications, and wrapping the widgets as iWidgets to be used in other applications. In addition, this book describes how to reuse IBM Content Navigator components in mobile development, and how to work with existing viewer or incorporate third-party viewer into IBM Content Navigator.

This book is intended for IT architects, and application designers and developers. It offers both a high-level description of how to extend and customize IBM Content Navigator and also more technical details of how to do implementation with sample code.

IBM Content Navigator has all the hooks and features you expect in a content navigation system.

Except for explicit subject identity and merging out of the box. Like you would have with a topic map based solution.

Skimming through the table of contents, it occurred to me that IBM has done most of the work necessary for a topic map based content management system.

Subject identity and merging doctrines are domain specific so entirely appropriate to handle as extensions to the IBM Content Navigator.

Think about it. Given IBM’s marketing budget and name recognition, is saying:

### Tweet Feeds For Topic Maps?

Friday, September 14th, 2012

The Twitter Trend lecture will leave you with a number of ideas about tracking tweets.

It occurred to me watching the video that a Twitter stream could be used as a feed into a topic map.

Not the same as converting a tweet feed into a topic map, where you accept all tweets on some specified condition.

No, more along the lines that the topic map application watches for tweets from particular users or from particular users with specified hash tags, and when observed, adds information to a topic map.

Thinking such a feed mechanism could have templates that are invoked based upon hash tags for the treatment of tweet content or to marshal other information to be included in the map.

For example, I tweet: doi:10.3789/isqv24n2-3.2012 #tmbib .

A TM application recognizes the #tmbib, invokes a topic map bibliography template, uses the DOI to harvests the title, author, abstract, creates appropriate topics. (Or whatever your template is designed to do.)

Advantage: I don’t have to create and evangelize a new protocol for communication with my topic maps.

Advantage: Someone else is maintaining the pipe. (Not to be underestimated.)

Advantage: Tweet software is nearly ubiquitous.

Do you see a downside to this approach?

### Wandora – Release – 2012-08-31

Sunday, September 9th, 2012

Wandora has a new release as of 2012-08-31.

Overview of the new features reports:

New Wandora application release (2012-08-31) is available. New features include a Guardian open platform extractor, a Freebase extractor, a DOT graph format export and a topic map layer visualization based on D3. Moreover, graph panel views now occurrences and graph panel filter management is easier. Release contains fixes on LTM import and disables some deprecated extractors.

Your experiences with the extractor modules in Wandora appreciated.

### ContextNote [Topic Map-based, semantic note taking]

Sunday, September 9th, 2012

ContextNote

From the website:

Note taking and Personal Knowledge Management (PKM)

ContextNote is a multi-platform (Web, Android, and iOS), topic map-based, semantic note taking application. Click the following link to see to see screen shots of ContextNote for Web in action.

Objective

To make the management of personal knowledge both simple and intuitive while at the same time being able to take full advantage of the expressive modelling capabilities of topic maps.

Well, that’s the trick isn’t it?

To make “…management of personal knowledge both simple and intuitive…” with “…full advantage of the expressive modelling capabilities of topic maps.”

Looking forward to more news about how ContextNote balances those goals.

### Topic Map Based Publishing

Monday, August 20th, 2012

After asking for ideas on publishing cheat sheets this morning, I have one to offer as well.

One problem with traditional cheat sheets is what any particular user wants in a cheat sheet?

Another problem is how expand the content of a cheat sheet?

And what if you want to sell the content? How does that work?

I don’t have a working version (yet) but here is my thinking on how topic maps could power a “cheat sheet” that meets all those requirements.

Solving the problem of what content to include seems critical to me. It is the make or break point in terms of attracting paying customers for a cheat sheet.

Content of no interest is as deadly as poor quality content. Either way, paying customers will vote with their feet.

The first step is to allow customers to “build” their own cheat sheet from some list of content. In topic map terminology, they specify an association between themselves and a set of topics to appear in “their” cheat sheet.

Most of the cheat sheets that I have seen (and printed out more than a few) are static artifacts. WYSIWYG artifacts. What there is and there ain’t no more.

Works for some things but what if what you need to know lies just beyond the edge of the cheat sheet? That’s that bad thing about static artifacts, they have edges.

In addition to building their own cheat sheet, the only limits to a topic map based cheat sheet are those imposed by lack of payment or interest.

You may not need troff syntax examples on a daily basis but there are times when they could come in quite handy. (Don’t laugh. Liam Quin got hired on the basis of the troff typesetting of his resume.)

The second step is to have a cheat sheet that can expand or contract based on the immediate needs of the user. Sometimes more or less content, depending on their need. Think of an expandable “nutshell” reference.

A WYWIWYG (What You Want Is What You Get) approach as opposed to WWWTSYIWYG (What We Want To Sell You Is What You Get) (any publishers come to mind?).

Finally, how to “sell” the content? The value-add?

Here’s one model: The user buys a version of the cheat sheet, which has embedded links to addition content. Links that when the user authenticates to a server, are treated as subject identifiers. Subject identifiers that cause merging to occur with topics on the server and deliver additional content. Each user subject identifier can be auto-generated on purchase and so are uniquely tied to a particular login.

The user can freely distribute the version of the cheat sheet they purchased, free advertising for you. But the additional content requires a separate purchase by the new user.

What blind alleys, pot holes and other hazards/dangers am I failing to account for in this scenario?

### What’s the Difference? Efficient Set Reconciliation without Prior Context

Monday, August 6th, 2012

What’s the Difference? Efficient Set Reconciliation without Prior Context by David Eppstein, Michael T. Goodrich, Frank Uyeda, and George Varghese.

Abstract:

We describe a synopsis structure, the Difference Digest, that allows two nodes to compute the elements belonging to the set difference in a single round with communication overhead proportional to the size of the difference times the logarithm of the keyspace. While set reconciliation can be done efficiently using logs, logs require overhead for every update and scale poorly when multiple users are to be reconciled. By contrast, our abstraction assumes no prior context and is useful in networking and distributed systems applications such as trading blocks in a peer-to-peer network, and synchronizing link-state databases after a partition.

Our basic set-reconciliation method has a similarity with the peeling algorithm used in Tornado codes [6], which is not surprising, as there is an intimate connection between set difference and coding. Beyond set reconciliation, an essential component in our Difference Digest is a new estimator for the size of the set difference that outperforms min-wise sketches [3] for small set differences.

Our experiments show that the Difference Digest is more efficient than prior approaches such as Approximate Reconciliation Trees [5] and Characteristic Polynomial Interpolation [17]. We use Difference Digests to implement a generic KeyDiff service in Linux that runs over TCP and returns the sets of keys that differ between machines.

Distributed topic maps anyone?

### Efficient Core Maintenance in Large Dynamic Graphs

Saturday, July 21st, 2012

Efficient Core Maintenance in Large Dynamic Graphs by Rong-Hua Li and Jeffrey Xu Yu.

Abstract:

The $k$-core decomposition in a graph is a fundamental problem for social network analysis. The problem of $k$-core decomposition is to calculate the core number for every node in a graph. Previous studies mainly focus on $k$-core decomposition in a static graph. There exists a linear time algorithm for $k$-core decomposition in a static graph. However, in many real-world applications such as online social networks and the Internet, the graph typically evolves over time. Under such applications, a key issue is to maintain the core number of nodes given the graph changes over time. A simple implementation is to perform the linear time algorithm to recompute the core number for every node after the graph is updated. Such simple implementation is expensive when the graph is very large. In this paper, we propose a new efficient algorithm to maintain the core number for every node in a dynamic graph. Our main result is that only certain nodes need to update their core number given the graph is changed by inserting/deleting an edge. We devise an efficient algorithm to identify and recompute the core number of such nodes. The complexity of our algorithm is independent of the graph size. In addition, to further accelerate the algorithm, we develop two pruning strategies by exploiting the lower and upper bounds of the core number. Finally, we conduct extensive experiments over both real-world and synthetic datasets, and the results demonstrate the efficiency of the proposed algorithm.

Maintenance of topic maps in the face of incoming information is an important issue.

I am intrigued by the idea of only certain nodes requiring updating based on addition/deletion of edges to a graph. Certainly true with topic maps and my question is whether this work can be adapted to work outside of core number updates?

Or perhaps more clearly, can it be adapted to work with a basis for merging topic maps? Or should core numbers be adapted for processing topic maps.

Questions I would be exploring if I had a topic maps lab. Maybe I should work up a proposal for one at an investment site.

### LSU Researchers Create Topic Map of Oil Spill Disaster

Thursday, July 12th, 2012

LSU Researchers Create Topic Map of Oil Spill Disaster

From the post:

The Gulf of Mexico Deepwater Horizon Oil Spill incident has impacted many aspects of the coastal environment and inhabitants of surrounding states. However, government officials, Gulf-based researchers, journalists and members of the general public who want a big picture of the impact on local ecosystems and communities are currently limited by discipline-specific and fractured information on the various aspects of the incident and its impacts.

To solve this problem, Assistant Professor in the School of Library and Information Science Yejun Wu is leading the way in information convergence on oil spill events. Wu’s lab has created a first edition of an online topic map, available at http://topicmap.lsu.edu/, that brings together information from a wide range of research fields including biological science, chemistry, coastal and environmental science, engineering, political science, mass communication studies and many other disciplines in order to promote collaboration and big picture understanding of technological disasters.

“Researchers, journalists, politicians and even school teachers wanted to know the impacts of the Deepwater Horizon oil spill incident,” Wu said. “I felt this was an opportunity to develop a tool for supporting learning and knowledge discovery. Our topic map tool can help people learn from historical events to better prepare for the future.”

Wu started the project with a firm belief in the need for an oil spill information hub.

“There is a whole list of historical oil spill events that we probably neglected – we did not learn enough from history,” Wu said.

He first looked to domain experts from various disciplines to share their own views of the impacts of the Deepwater Horizon oil spill. From there, Wu and his research associate and graduate students manually collected more than 7,000 concepts and 4,000 concept associations related to oil spill incidents worldwide from peer-reviewed journal articles and authoritative government websites, loading the information into an organizational topic map software program. Prior to these efforts by Wu’s lab, no comprehensive oil spill topic map or taxonomy existed.

“Domain experts typically focus on oil spill research in their own area, such as chemistry or political communication, but an oil spill is a comprehensive problem, and studies should be interdisciplinary,” Wu said. “Experts in different fields that usually don’t talk to each can benefit from a tool that brings together and organizes information concepts across many disciplines.”

Wikipedia calls it: Deepwater Horizon oil spill. I think BP Oil Spill is a better name.

Just thinking of environmental disasters, which ones would you suggest for topic maps?

### Workshops Semantic knowledge solutions

Thursday, May 10th, 2012

Workshops Semantic knowledge solutions by Fiemke Griffioen.

From the post:

Morpheus is organizing a number of one-day workshops Semantic knowledge solutions about how knowledge applications can be developed within your organization. We show what the advantages are of gaining insight into your knowledge and sharing knowledge.

In the workshops our Kamala webapplication is used to model knowledge. Kamala is a web application for efficiently developing and sharing semantic knowledge and is based on the open source Topic Maps-engine Ontopia. Kamala is similar to the editor of Ontopia, Ontopoly, but more interactive and flexible because users require less knowledge of the Topic Maps data model in advance.

Since I haven’t covered Kamala before:

Kamala includes the following features:

• Availability of the complete data model of Topic Maps standard
• Navigation based on ontological structures
• Search topics based on naming
• Sharing topic maps with other users (optionally read-only)
• Importing and exporting topic maps to the standard formats XTM, TMXML, LTM, CXTM, etc.
• Querying topic maps with the TOLOG or TMQL query languages
• Storing queries for simple repetition of the query
• Validation of topic maps, so that ‘gaps’ in the knowledge model can be traced
• Generating statistics

The following modules are available to expand Kamala’s core functionality:

• Geo-module, so topics with a geotag can be placed on a map
• Facet indexation for effective navigation based on classification

The workshops are on Landgoed Maarsbergen (That’s what I said, so I included the contact link, which has a map.)

### Scalability of Topic Map Systems

Saturday, April 28th, 2012

Scalability of Topic Map Systems, thesis by Marcel Hoyer.

Abstract:

The purpose of this thesis was to find approaches solving major performance and scalability issues for Topic Maps-related data access and the merging process. Especially regarding the management of multiple, heterogeneous topic maps with different sizes and structures. Hence the scope of the research was mainly focused on the Maiana web application with its underlying MaJorToM and TMQL4J back-end.

In the first instance the actual problems were determined by profiling the application runtime, creating benchmarks and discussing the current architecture of the Maiana stack. By presenting different distribution technologies afterwards the issues around a single-process instance, slow data access and concurrent request handling were investigated to determine possible solutions. Next to technological aspects (i. e. frameworks or applications) this discussion included fundamental reflection of design patterns for distributed environments that indicated requirements for changes in the use of the Topic Maps API and data flow between components. With the development of the JSON Topic Maps Query Result format and simple query-focused interfaces the essential concept for an prototypical implementation was established. To concentrate on scalability for query processing basic principles and benefits of message-oriented middleware were presented. Those were used in combination with previous results to create a distributed Topic Maps query service and to present ideas about optimizing virtual merging of topic maps.

Finally this work gave multiple insights to improve the architecture and performance of Topic Maps-related applications by depicting concrete bottlenecks and providing prototypical implementations that show the feasibility of the approaches. But it also pointed out remaining performance issues in the persisting data layer.

I have just started reading Marcel’s thesis but I am already impressed by the evaluation of Maiana. I am sure this work will be useful in planning options for future topic map stacks.

Commend it to you for reading and discussion, perhaps on the relatively quiet topic map discussion lists?

### Building Highly Available Systems in Erlang

Saturday, April 21st, 2012

Building Highly Available Systems in Erlang

From the description:

Summary

Joe Armstrong discusses highly available (HA) systems, introducing different types of HA systems and data, HA architecture and algorithms, 6 rules of HA, and how HA is done with Erlang.

Bio

Joe Armstrong is the principal inventor of Erlang and coined the term “Concurrency Oriented Programming”. At Ericsson he developed Erlang and was chief architect of the Erlang/OTP system. In 1998 he formed Bluetail, which developed all its products in Erlang. In 2003 he obtain his PhD from the Royal Institute of Technology, Stockholm. He is author of the book “Software for a concurrent world”.

Gives the six (6) rules for highly available systems and how Erlang meets those six (6) rules.

• Isolation rule: Operations must be isolated
• Concurrency: The world is concurrent
• Must detect failures: If can’t detect, can’t fix
• Fault Identification: Enough detail to do something.
• Stable Storage: Must survive universal power failure.

Quotes: Why Computers Stop and What Can Be Done About It, Jim Gray, Technical Report 85.7, Tandem Computers 1985, for example.

Highly entertaining and informative.

What do you think of the notion of an evolving software system?

How would you apply that to a topic map system?

### Topincs 6.1.0 – (Works for > 97% of all U.S. Businesses)

Monday, April 9th, 2012

Topincs 6.1.0

From the release notes:

This release has shown good performance under commercial conditions with:

• 30 concurrent users
• 60.000 topics
• 200.000 associations
• 200.000 occurrences
• 7.000 files
• + 3 smaller stores

If that sounds like a small number of concurrent users, consider the following statistics from 2008 on businesses in the United States:

 Total businesses 27,757,676 Nonemployers (no payroll) 21,708,021 Firms with 1 to 4 employees 3,617,764 Firms with 5 to 9 employees 1,044,065 Firms with 10 to 20 employees 633,141

The next break is 20 to 99 employees.

With 30 concurrent users, Topincs supports more users than more than 97% of all U.S. businesses have employees.

Different way to think about marketing a product.

For < 20 employees, there are 26,646,290 potential purchasers. For > 20 employees there are 655,587 potential purchasers.

Which target sounds larger to you?

Anyone care to supply the numbers for other geographic areas?

### Annotator (and AnnotateIt)

Friday, April 6th, 2012

Annotator

From the webpage:

The Annotator is an open-source JavaScript library and tool that can be added to any webpage to make it annotatable.

Annotations can have comments, tags, users and more. Morever, the Annotator is designed for easy extensibility so its a cinch to add a new feature or behaviour.

AnnotateIt is a bookmarklet that claims to allow annotation of arbitrary webpages.

Not what I think anyone was expecting when XLink/XPointer were young but perhaps sufficient unto the day.

I am going to look rather hard at this and it may appear as part of this blog in the near future.

What other features do you think would make this a better topic mapping tool?

### Ontopia

Friday, April 6th, 2012

Ontopia

Tutorial from TMRA 2010 by Lars Marius Garshol and Geir Ove Grønmo on the Ontopia software suite.

200+ slides so it is rather complete.

### Functional thinking: Functional design patterns, Part 1

Friday, March 9th, 2012

Functional thinking: Functional design patterns, Part 1 – How patterns manifest in the functional world

Summary

Contrary to popular belief, design patterns exist in functional programming — but they sometimes differ from their object-oriented counterparts in appearance and behavior. In this installment of Functional thinking, Neal Ford looks at ways in which patterns manifest in the functional paradigm, illustrating how the solutions differ.

From the article:

Some contingents in the functional world claim that the concept of the design pattern is flawed and isn’t needed in functional programming. A case can be made for that view under a narrow definition of pattern — but that’s an argument more about semantics than use. The concept of a design pattern — a named, cataloged solution to a common problem — is alive and well. However, patterns sometimes take different guises under different paradigms. Because the building blocks and approaches to problems are different in the functional world, some of the traditional Gang of Four patterns (see Resources) disappear, while others preserve the problem but solve it radically differently. This installment and the next investigate some traditional design patterns and rethink them in a functional way.

A functional approach to topic maps lends a certain elegance to the management of merging questions.

### Ontopia 5.2.0

Thursday, February 2nd, 2012

Ontopia 5.2.0

A new release from the Ontopia project has hit the street! Ontopia 5.2.0!

From the “What’s New” document in the distribution:

This is the first release in the new Maven structure. It includes the modularization of Ontopia along with bug fixes along with some new functionality.

The following changes have been made:

• Ontopia is now divided into Maven modules based functionality. For developers working with Ontopia as a dependency this means that there is a more controlled way of including parts of Ontopia as a dependency. This change does not affect Ontopia distribution users.
• The distribution has been updated to include Tomcat version 6.
• The DB2TM functionality has been extended and improved.
• Ontopoly had several outstanding bugs. Support for exporting TM/XML and schema without data was added.
• Tolog now supports negative integer values and some basic numeric operations through the numbers module.
• Ontopia now uses Lucene 2.9.4 (up from 2.2.0).

Thirty-seven (37) bugs were squashed but you will need to consult the “What’s New” file for the details.

Please send notes of congratulation to the team for the new release. They know you are grateful but a little active encouragement can go a long way.

### Is it time to get rid of the Linux OS model in the cloud?

Sunday, January 22nd, 2012

Is it time to get rid of the Linux OS model in the cloud?

From the post:

You program in a dynamic language, that runs on a JVM, that runs on a OS designed 40 years ago for a completely different purpose, that runs on virtualized hardware. Does this make sense? We’ve talked about this idea before in Machine VM + Cloud API – Rewriting The Cloud From Scratch, where the vision is to treat cloud virtual hardware as a compiler target, and converting high-level language source code directly into kernels that run on it.

As new technologies evolve the friction created by our old tool chains and architecture models becomes ever more obvious. Take, for example, what a team at USCD is releasing: a phase-change memory prototype  - a solid state storage device that provides performance thousands of times faster than a conventional hard drive and up to seven times faster than current state-of-the-art solid-state drives (SSDs). However, PCM has access latencies several times slower than DRAM.

This technology has obvious mind blowing implications, but an interesting not so obvious implication is what it says about our current standard datacenter stack. Gary Athens has written an excellent article, Revamping storage performance, spelling it all out in more detail:

Computer scientists at UCSD argue that new technologies such as PCM will hardly be worth developing for storage systems unless the hidden bottlenecks and faulty optimizations inherent in storage systems are eliminated.

Moneta, bypasses a number of functions in the operating system (OS) that typically slow the flow of data to and from storage. These functions were developed years ago to organize data on disk and manage input and output (I/O). The overhead introduced by them was so overshadowed by the inherent latency in a rotating disk that they seemed not to matter much. But with new technologies such as PCM, which are expected to approach dynamic random-access memory (DRAM) in speed, the delays stand in the way of the technologies’ reaching their full potential. Linux, for example, takes 20,000 instructions to perform a simple I/O request.

By redesigning the Linux I/O stack and by optimizing the hardware/software interface, researchers were able to reduce storage latency by 60% and increase bandwidth as much as 18 times.

The I/O scheduler in Linux performs various functions, such as assuring fair access to resources. Moneta bypasses the scheduler entirely, reducing overhead. Further gains come from removing all locks from the low-level driver, which block parallelism, by substituting more efficient mechanisms that do not.

Moneta performs I/O benchmarks 9.5 times faster than a RAID array of conventional disks, 2.8 times faster than a RAID array of flash-based solid-state drives (SSDs), and 2.2 times faster than fusion-io’s high-end, flash-based SSD.

Read the rest of the post and then ask yourself what architecture do you envision for a topic map application?

What if rather that moving data from one data structure to another, that the data structure addressed is identified by the data? If you wish to “see” the data as a table, it reports is location by table/column/row. If you wish to “see” the data as a matrix, it reports its matrix position. If you wish to “see” the data as a linked list, it can report its value, plus those ahead and behind.

It isn’t that difficult to imagine that data reports its location on a graph as the result of an operation. Perhaps storing its graph location for every graphing operation that is “run” using that data point.

True enough we need to create topic maps that run on conventional hardware/software but that isn’t an excuse to ignore possible futures.

Reminds me of a “grook” that I read years ago: “You will conquer the present suspiciously fast – if you smell of the future and stink of the past.” (Piet Hein but I don’t remember which book.)

### structr – update

Monday, January 16th, 2012

structr

One of the real pleasures of going over my older posts is checking up on projects I have mentioned in the past. Particularly when they show significant progress since the last time I looked.

Structr is one of those projects.

A lot of progress and I saw today that the homepage advertises:

With structr, you can build web sites or portals, but also interactive web applications.

And if you like, you can add topic maps or ontologies to the content graph. (emphasis added)

Guess I need to throw a copy on my “big box” and see what happens!

### Wandora – New Release 2011-12-07

Sunday, December 18th, 2011

Wandora – New Release 2011-12-07

A new release of Wandora is out!

I haven’t tested the new features but I am sure the project would appreciate any comments you have.

Some early remarks:

The “version” of Wandora should appear in the file name, which would be helpful for those of us with multiple versions on our hard drives.

There should be more detailed release notes, for bugs as well as new features.

I may be overlooking it but if a formal bug/feature tracking system is being used (other than the forum), it would be useful to have at least a read-only link to that tracking system.

### Topincs 5.7.0

Thursday, December 15th, 2011

Topincs 5.7.0

From the webpage:

Description

This version offers a bundle of new features to make it easy for the developer to create tailored views for users with minimal coding effort:

• Up to this Topincs version all statements made in a form were validated independent from each other. With compound constraints this has an end. By using tiny JavaScript snippets arbitrary validation rules can be formulated.
• Customizable context menus on topic pages offer tailored actions that mean more to the user than the generic edit button. The context menu is by default to the left of the page on the opposite side of all generic functions. Forms can be entered with bound values inferred from the context (time, subject, …). This new feature bridges the gap from the generic web database to web application.
• It is now possible to freeze topics in the user interface and the API.

Apart from these core features a number of smaller improvements and changes were made, most notably the support for SSL was verified.

You would think software authors would not depend upon ragged bloggers to supply the download links for their software.

That it would be the first thing out of their mouths: Download Topincs HERE! or something like that.

Maybe it is just me. With every release I have to think about how to get back to the downloads page.

Do take a look!

### SpiderDuck: Twitter’s Real-time URL Fetcher

Friday, November 25th, 2011

A bit of a walk on the engineering side but in order to be relevant, topic maps do have to be written and topic map software implemented.

This a very interesting write-up of how Twitter relied mostly on open source tools to create a system that could be very relevant to topic map implementations.

For example, the fetch/no-fetch decision for URLs is based on a comparison to URLs fetched within X days. Hmmm, comparison of URLs, oh, those things that occur in subjectIdentifier and subjectLocator properties of topics. Do you smell relevance?

And there is harvesting of information from web pages, one assumes that could be done on “information items” from a topic map as well, except there it would be properties, etc. Even more relevance.

What parts of SpiderDuck do you find most relevant to a topic map implementation?

### Triggers are coming! Triggers are coming!

Sunday, November 20th, 2011

Triggers are coming! Triggers are coming!

Triggers are going to appear in Topincs 5.6.0 (to be released).

From the webpage:

Issue description
It should be possible to define triggers (or event handlers?). Code should be held in a directory triggers next to the directories domain and services.

Comment
Triggers are created on the command line with the new create-trigger command. This creates a topic of type Topincs trigger with id ID writes a file into STORE_DIR/php/triggers/ID.php. The user has to 1) specify when the trigger is run and 2) code what the trigger should do.

This looks very useful, whether you have streaming input or not.

Topincs homepage

### Next Generation Cluster Computing on Amazon EC2 – The CC2 Instance Type

Thursday, November 17th, 2011

Next Generation Cluster Computing on Amazon EC2 – The CC2 Instance Type

From the post:

Today we are introducing a new member of the Cluster Compute Family, the Cluster Compute Eight Extra Large. The API name of this instance is cc2.8xlarge so we’ve taken to calling it the CC2 for short. This instance features some incredible specifications at a remarkably low price. Let’s take a look at the specs:

Processing – The CC2 instance type includes 2 Intel Xeon processors, each with 8 hardware cores. We’ve enabled Hyper-Threading, allowing each core to process a pair of instruction streams in parallel. Net-net, there are 32 hardware execution threads and you can expect 88 EC2 Compute Units (ECU’s) from this 64-bit instance type. That’s nearly 90x the rating of the original EC2 small instance, and almost 3x the rating of the first-generation Cluster Compute instance.

Storage – On the storage front, the CC2 instance type is packed with 60.5 GB of RAM and 3.37 TB of instance storage.

Networking – As a member of our Cluster Compute family, this instance is connected to a 10 Gigabit network and offers low latency connectivity with full bisection bandwidth to other CC2 instances within a Placement Group. You can create a Placement Group using the AWS Management Console:

Pricing – You can launch an On-Demand CC2 instance for just $2.40 per hour. You can buy Reserved Instances, and you can also bid for CC2 time on the EC2 Spot Market. We have also lowered the price of the existing CC1 instances to$1.30 per hour.

You have the flexibility to choose the pricing model that works for you based on your application, your budget, your deadlines, and your ability to utilize the instances. We believe that the price-performance of this new instance type, combined with the number of ways that you can choose to acquire it, will result in a compelling value for scientists, engineers, and researchers.

Seems like it was only yesterday that I posted a note that NuvolaBase.com was running a free cloud beta. Hey! That was only yesterday!

Still a ways off from unmetered computing resources but moving in that direction.

If you have some experience with one of the cloud services, consider writing up a pricing example for experimenting with topic maps. I suspect that would help a lot of people (including me) get their feet wet with topic maps and cloud computing.

### How Common Is Merging?

Wednesday, November 9th, 2011

I started wondering about how common merging is in topic maps because I discovered a lack I have not seen before. There aren’t any large test collections of topic maps for CS types to break their clusters against. The sort of thing that challenges their algorithms and hardware.

But test collections should have some resemblance to actual data sets, at least if that is known with any degree of certainty. Or at least be one of the available data sets.

As a first step towards exploring this issue, I grepped for topics in the Opera and CIA Fact Book and got:

• Opera topic map: 29,738
• CIA Fact Book: 111,154

for a total of 140,892 topic elements. After merging the two maps, there were 126,204 topic elements. So I count that as merging 14,688 topic elements.

Approximately 10% of the topics in the two sets.

A very crude way to go about this but I was looking for rough numbers that may provoke some discussion and more refined measurements.

I mention that because one thought I had was to simply “cat” the various topic maps at the topicmapslab.de in CTM format together into one file and to “cat” that file until I have 1 million, 10 million and 100 million topic sets (approximately). Just a starter set to see what works/doesn’t work before scaling up the data sets.

Creating the files in this manner is going to result in a “merge heavy” topic map due to the duplication of content. That may not be a serious issue and perhaps better that it be that way in order to stress algorithms, etc. It would have the advantage that we could merge the original set and then project the number of merges that should be found in the various sets.