Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 27, 2011

Big Data: Millionfold Mashups and the Shape of Data

Filed under: BigData,Mashups — Patrick Durusau @ 8:00 am

Big Data: Millionfold Mashups and the Shape of Data

Philip (flip) Kromer (infochimps.com) talks about data, including the nine fold path to data enlightenment.

Hard to pick the most interesting part of the presentation.

Whether it was when Philip said that human experts would need to do the heavy lifting to semantic level or when he said Infochimps is working on an everything about API.

I don’t think those are entirely consistent but it was an impressive presentation! Definitely worth the time to watch.

Found at MyNoSQL by Alex Popescu.

How Sharding Works – Presentation – 4 Feb. 2010

Filed under: NoSQL,Sharding,Topic Maps — Patrick Durusau @ 7:54 am

How Sharding Works, a presentation by Kristina Chodorow, author of MongoDB: The Definitive Guide.

Date: 4 Feb. 2010

Register

Of interest to topic maps that partition topics.

I thought last night after I wrote a draft of this post that sharding would interfere with arbitrary merging of any topic with any other topic.

OK, so what was the question?

True, sharding will make merging of arbitrary topics in a topic map more costly (if possible at all) but how often is completely unconstrained merging an actual requirement?

I suspect that most topic map projects, other than theoretical ones, already know what merging they are interested in and how those subjects are going to be identified.

Allowances for additional identifications of subjects should be made but that is a matter of careful design of your topic map.

Suggestion: Have merging specified just like any other requirement. What is expected? What is the criteria for success? What allowances need to be made for future expansion?

Mapping Domains to Domainers

Filed under: Examples,Marketing,Topic Maps — Patrick Durusau @ 6:28 am

Epik Has Epic Semantic Web Plans For Its Domains and Domainers

Unfortunate article about how people who park domains to extort money from others can use semantic technologies to supply content to their sites.

I was thinking last night of a much different use of semantic technologies with regard to domainers.

Wouldn’t it be interesting to have a topic map that traces all the parked and frivolous domains?

That is creates topics to represent the same so Google and other search services can simply exclude those sites from search results?

There’s one useful result right there.

Another useful result would be to associate the individuals who work for or own such companies with those companies.

They are certainly free to generate domain names and snap them up by the thousands, while junking up search results for all the rest of us.

But, then we are also free to choose who we will associate with.

Topic maps can help us bring honor and shame to the WWW. Has worked for centuries, no reason it should not work now.

*****
PS: Maybe we could have contests, Find that Domainer, how many minutes, seconds will it take you to identify a domainer from a domain name? Or to locate their photo? Or place of business/residence on Google maps?

Easy Semantic Solution Is At Hand! – Post

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 6:15 am

The Federated Enterprise (Using Semantic Technology Standards to Federate Information and to Enable Emergent Analytics)

I had to shorten the title a bit. 😉

Wanted you to be aware of the sort of nonsense that data warehouse people are being told:

The procedure described above enabling federation based on semantic technology is not hard to build; it is just a different way of describing things that people in your enterprise are already describing using incompatible technologies like spreadsheets, text processors, diagramming tools, modeling tools, email, etc. The semantic approach simply requires that everything be described in a single technology, RDF/OWL. This simple change in how things are described enables federation and the paradigm shifting capabilities that accompany it.

Gee, why didn’t we think about that? A single technology to describe everything.

Shakespeare would call this …a tale told by an idiot….

Just thought you could start the day with a bit of amusement.

*****
PS: It’s not the fault of RDF or OWL that people say stupid things about them.

When supercomputers meet the Semantic Web – Post

Filed under: Linked Data,Searching,Semantic Web — Patrick Durusau @ 5:59 am

When supercomputers meet the Semantic Web

Jack Park forwarded the link to this post.

It has descriptions like:

Everything about the hardware is optimised to churn through large quantities of data, very quickly, with vital statistics that soon become silly. A single processor “can sustain 128 simultaneous threads and is connected with up to 8 GB of memory.” The Cray XMT comes with at least 16 of those processors, and can scale to over 8,000 of them in order to handle over 1 million simultaneous threads with 64 TB of shared system memory. Should you want to, you could easily hold the entire Linked Data Cloud in main memory for rapid analysis without the usual performance bottleneck introduced by swapping data on and off disks.

Now, that’s computing!

Do note of the emphasis on graph processing.

I think Semantic Web and topic map fans would do well to pay attention to the big data movement mentioned in this article.

Imagine a topic map whose topics emerge in interaction with subject matter experts querying the data as opposed to being statically authored.

Same for associations between subjects and even their association types.

Still topic maps, just a different way to think about authoring them.

I don’t have a Cray XMT but it should be possible to practice emergent topic map authoring on a smaller device.

I rather like that, emergent topic map authoring, ETMA.

Let me push that around a bit and I will post further notes about it.

January 26, 2011

Palestine Papers

Filed under: Authoring Topic Maps,Examples,Topic Maps — Patrick Durusau @ 1:24 pm

Palestine Papers

Quite helpfully, Aljazeera has published a glossary for the Palestine Papers.

The Palestine Papers were intended as internal notes, and so they make heavy use of jargon, acronyms and abbreviations. We’ve compiled a list of the most frequently-used terms.

Acronym Definition
AMA Agreement on Movement and Access
API Arab Peace Initiative
BATNA Best alternative to a negotiated agreement
CBM Confidence-building measure
CEC Central Elections Committee
GOI Government of Israel
KSCP Kerem Shalom crossing point
LO Liaison office
MB Muslim Brotherhood
MF Multi-national force
MFA Israeli ministry of foreign affairs
NAD Negotiations affairs department
NSU Negotiation support unit
NUG National unity government
PA Palestinian Authority
PG Presidential Guard
PLC Palestinian Leadership Council
PS Permanent status
PSN Permanent status negotiations
RCP Rafah crossing point
RM Road Map
SPB State with provisional borders
SSR Security sector reform
SWG Security working group
TOR Terms of reference
WG Working group

People

Different documents use different abbreviations for key negotiators: Tzipi Livni, for example, is referred to as both TL and TZ. This list covers the most commonly-used abbreviations.

Acronym Person
AA Abu Ala’ (Ahmed Qureia)
AB Azem Bishara
AG Amos Gilad
AM Abu Mazen (Mahmoud Abbas)
ARY Gen. Abdel Razzaq Yahia
BM Ban Ki-moon
BO Barack Obama
CR Condoleezza Rice
DW David Welch
ES Ephraim Sneh
GS Gilad Shed
JS Javier Solana
KD Gen. Keith Gayton
KE Khaled el-Gindy
MD Mohammad Dahlan
MO Marc Otte
PP Lt. Gen. Pietro Pistolese
PR Col. Paul Rupp
PS Pablo Serrano
RD Rami Dajani
RN Gen. Raji Najami
SA Samih al-Abed
SE Saeb Erekat
SF Salam Fayyad
ST Shalom Tourgeman
TB Tal Becker
TL Tzipi Livni
UD Udi Dekel
YAR Yasser Abed Rabbo
YG Yossi Gal

I say helpfully but a printed glossary isn’t as helpful as it could be.

For example, what if instead of a static glossary, additional information could be added for each person or organization?

That was mappable to either additional public or private data.

Watch this space for first steps on making the glossary more than just a glossary.

Onotoa: Simply create your Topic Maps schemas

Filed under: TMCL,Topic Map Software — Patrick Durusau @ 1:11 pm

Onotoa: Simply create your Topic Maps schemas

The 1.2 version is due out fairly soon.

Onotoa assists in the construction of topic map schemas (constraint schemas).

It provides a graphic representation of types and constraints for a topic map.

You probably want to read Creating a topic map ontology with Onotoa before you try the Onotoa Handbook.

I am not sure the Onotoa Handbook, even if it were available in editable form (say ODF?), would be worth the effort to make an editorial pass at it.

Suspect splitting it into a reference manual and a users manual would be a step in the right direction. Then do an editorial pass.

Afghan War Diary – 2004 – Maiana – Puzzlers

Filed under: Authoring Topic Maps,Examples,Topic Maps — Patrick Durusau @ 10:35 am

I was looking at the Afghan War Diary – 2004 at Maiana yesterday.

A couple of things puzzled me so I though I would mention them here.

Take a short look at the ontology for the diary.

I’ wait.

OK, now follow the link for Index of Individuals.

Wait! Err, there wasn’t any category that I saw in the ontology for individuals.

Did you see one?

BTW, scroll down, way down, the listing of individuals. I am assuming that cities and diary entries are both individuals?

I suppose but it looks like an odd modeling choice.

When I think of individuals I think of, you know, people.

I haven’t looked closely but do the reports include the name of persons? That is what I would consider an individual.

Ah, you know what? Individuals = Topics. Someone renamed it.

But how useful is that?

Having every subject represented by a topic in a single index?

That is as unhelpful as a Google search result.

Particularly if your topic map is of any size.

Have indexes of commonly looked for things like geographic locations by name or organizations, etc.

BTW, I don’t think that USMC is of type Host Nation.

If USMC expands to United States Marine Corps then I suspect a type of military organization is probably more accurate.

I stopped looking at this point.

Please forward suggestions/corrections to the project.

Dimensions to use to compare NoSQL data stores – Queries to Produce Topic Maps

Filed under: Merging,NoSQL,TMDM,Topic Map Software,Topic Maps — Patrick Durusau @ 9:08 am

Dimensions to use to compare NoSQL data stores

A post by Huan Liu to read after Billy Newport’s Enterprise NoSQL: Silver Bullet or Poison Pill? – (Unique Questions?)

A very good quick summary of the dimension to consider. As Liu makes clear, choosing the right data store is a complex issue.

I would use this as an overview article to get everyone on a common ground for a discussion of NoSQL data stores.

At least that way, misunderstandings will be on some other topic of discussion.

BTW, if you think about Newport’s point (however correct/incorrect) that NoSQL databases enable only one query, doesn’t that fit the production of a topic map?

That is there is a defined set of constructs, with defined conditions of equivalence. So the only query in that regard has been fixed.

Questions remain about querying the data that a topic map holds, but the query that results in merged topics, associations, etc.

In some processing models, that query is performed and a merged artifact is produced.

Following the same data model rules, I would prefer to allow those queries be made on an ad hoc basis. So that users are always presented with the latest merged results.

Same rules as the TMDM, just a question of when they fire.

Questions:

  1. NoSQL – What other general compare/dimension articles would you recommend as common ground builders? (1-3 citations)
  2. Topic maps as artifacts – What other data processing approaches produce static artifacts for querying? (3-5 pages, citations)
  3. Topic maps as query results – What are the concerns and benefits of topic maps as query results? (3-5 pages, citations)

A Quick WebApp with Scala, MongoDB, Scalatra and Casbah – Practice for TMs

Filed under: MongoDB,Scala,Software,Topic Maps — Patrick Durusau @ 8:41 am

A Quick WebApp with Scala, MongoDB, Scalatra and Casbah

However clever, topic maps aren’t of much interest unless they are delivered to users.

In the general case that means a web based application.

This post is a short introduction to several tools you may find handy with building and/or delivering topic maps.

*****
PS: We will know topic maps have arrived when the technology keeps changing but management of subject identity is inherent in both programming languages and application design. Ways to go yet.

SQLShell. A Cross-Database SQL Tool With NoSQL Potential

Filed under: NoSQL,SQL — Patrick Durusau @ 7:20 am

SQLShell. A Cross-Database SQL Tool With NoSQL Potential

From the website:

In this blog post I will introduce SQLShell and demonstrate, step-by-step, how to install it and start using it with MySQL. I will also reflect on the possibilites of using this with NoSQL technologies, such as HBase, MongoDB, Hive, CouchDB, Redis and Google BigQuery.

SQLShell is a cross-platform, cross-database command-line tool for SQL, much like psql for PostgreSQL or the mysql command-line tool for MySQL.

Discovers that JDBC drivers have not yet developed to the point where a common interface can be demonstrated.

It is only a matter of time until they do improve and tools such as SQLShell will be important for data exploration and harvesting.

Enterprise NoSQL: Silver Bullet or Poison Pill? – (Unique Questions?)

Filed under: NoSQL,SQL — Patrick Durusau @ 7:03 am

Enterprise NoSQL: Silver Bullet or Poison Pill? a presentation by Billy Newport (IBM).

Very informative comparison between SQL and NoSQL mindsets and what considerations lead to one or the other.

The “ah-ha” point in the presentation was Newport saying that for NoSQL, one has to ask what question do you want to have answered?

I am not entirely convinced by Newport’s argument that SQL supports arbitrary queries and that NoSQL design of necessity supports only a single query robustly.

Granting there are design choices that can point a NoSQL designer into a corner, but I don’t think it is fair to assume all NoSQL designers will make the same mistakes.

Or even that all NoSQL solutions obtain such limitations.

I don’t know of anything inherently query limiting about a graph database or even a hypergraph database architecture.

If you quickly point out sharding and it driving design to answer a particular question, my response is: And your question is?

How many arbitrary questions do you think there are for any given data set?

That would be an interesting research question.

How many unique questions (not queries) are asked of the average data set?

That is: unique queries != unique questions.

Application designers can design queries to match their application logic but that isn’t the same thing as a unique question.

Is that Newport’s concern (or at least part of it)? That NoSQL may put limits on the design of application logic? That could be good or bad.

DataMarket – Drill Down/Annotate?

Filed under: Data Source,Marketing,Topic Maps — Patrick Durusau @ 6:38 am

DataMarket

From the website:

Find and understand data.

Visualize the world’s economy, societies, nature, and industries, and gain new insights.

100 million time series from the most important data providers, such as the UN, World Bank and Eurostat.

I have just registered for the free account and have started poking about.

This looks deeply awesome!

In addition to being a source of data for analytical tools, I see an opportunity for topic maps to enable a drill-down capacity for such displays.

After all, any point in a time series is data from a file but at least for most such data, it should be traceable back to a file, report, questionnaire.

And from that file, report, questionnaire, it should be further traceable back to the author of the file or report and even further back, to the persons reported upon or questioned.

This site definitely has potential for real growth, particularly if they offer tools that enable drill down into data sets to source materials as well as to annotate points in a data set with other materials. Topic maps would excel at both.

Questions:

  1. Register for a free account.
  2. Choose any two data sets and create two visualizations (use screen capture to capture the graphic).
  3. What information would you want to drill down to find or that you would want to use to annotate data points in either visualization? (3-5 pages, no citations)

Using Apache Avro – Repeatable/Shareable?

Filed under: Avro,Hadoop — Patrick Durusau @ 6:33 am

Using Apache Avro by Boris Lublinsky.

From the post:

Avro[1] is a recent addition to Apache’s Hadoop family of projects. Avro defines a data format designed to support data-intensive applications, and provides support for this format in a variety of programming languages.

Avro provides functionality that is similar to the other marshalling systems such as Thrift, Protocol Buffers, etc. The main differentiators of Avro include[2]:

  • “Dynamic typing: Avro does not require that code be generated. Data is always accompanied by a schema that permits full processing of that data without code generation, static datatypes, etc. This facilitates construction of generic data-processing systems and languages.
  • Untagged data: Since the schema is present when data is read, considerably less type information need be encoded with data, resulting in smaller serialization size.
  • No manually-assigned field IDs: When a schema changes, both the old and new schema are always present when processing data, so differences may be resolved symbolically, using field names.”

I wonder about the symbolic resolution of differences using field names?

At least being repeatable and shareable.

By repeatable I mean that six months or even six weeks from now one understands the resolution. Not much use if the transformation is opaque to its author.

And shareable should mean that I can transfer the resolution to someone else who can then decide to follow, not follow or modify the resolution.

In another lifetime I was a sysadmin. I can count on less than one finger the number of times I would have followed a symbolic resolution that was not transparent. Simply not done.

Wait until the data folks, who must be incredibly trusting (anyone have some candy?), encounter someone who cares about critical systems and data.

Topic maps can help with that encounter.

GSoC 2010 mid-term: Graph Streaming API – Post

Filed under: Data Mining,Gephi,Graphs,Visualization — Patrick Durusau @ 6:08 am

GSoC 2010 mid-term: Graph Streaming API by André Panisson.

From the blog:

The purpose of the Graph Streaming API project, run by André Panisson, is to build a unified framework for streaming graph objects. Gephi’s data structure and visualization engine has been built with the idea that a graph is not static and might change continuously. By connecting Gephi with external data-sources, we leverage its power to visualize and monitor complex systems or enterprise data in real-time. Moreover, the idea of streaming graph data goes beyond Gephi, and a unified and standardized API could bring interoperability with other available tools for graph and network analysis, as they could start to interoperate with other tools in a distributed and cooperative fashion.

There are times when no comment seems adequate. This is one of those times.

Read the post, play with the code, follow the work (and support it!).

I’m just a bill…

Filed under: Examples,Marketing,Topic Maps — Patrick Durusau @ 5:20 am

Remember the Schoolhouse Rock song about how a bill becomes a law in the US?

If you don’t, see: Schoolhouse Rock- How a Bill Becomes a Law.

That level of understanding the legislative process is found in: Stream Congress: A real-time data stream for Congress

From the website:

Once Congress gets back to work, Stream Congress will serve as a good example of what the Real Time Congress API provides: floor updates, bill status, floor votes, committee hearing notices, and much more.

Don’t get me wrong, I like Sunlight Labs.

They have the potential to alter the political landscape.

But not with this understanding of how laws are made in the US.

Members of Congress write bills? Really? You really think that?

Have you ever met a member of Congress? Either house?

Let’s start by naming when a bill is proposed, the staffers, lobbyists, administration representatives, who wrote the bill.

The actual bill authors.

They have goals, friends, etc., that are being furthered by the bills they write (which are passed unread by most members of Congress).

Include who is paying the actual bill authors as well and their sources of funding.

Run that backwards into other legislative sessions. So we can follow patterns of money and ideology that shapes legislation before it ever gets proposed.

Then match up people interested in the bill with financial contributions to members of Congress. And the financial or other interest they have in the bill’s outcome.

We have the capacity to name names and make government truly transparent.

But only if we shine light on the actual process.

Topic maps can help with that.

*****
PS: Transparency would require far more than these off-hand suggestions and would not be cheap. Inquiries welcome.

January 25, 2011

LSH Algorithm and Implementation (E2LSH)

Filed under: Hashing,High Dimensionality,Neighbors — Patrick Durusau @ 10:56 am

LSH Algorithm and Implementation (E2LSH) Authors: Alexandr Andoni and Piotr Indyk

Andoni and Indyk aren’t any better with titles than they were in Whose Your Nearest Neighbor but here you will find additional information on their high dimension nearest neighbor algorithm as well as an implementation of the algorithm.

Wukong, Bringing Ruby to Hadoop – Post

Filed under: Hadoop — Patrick Durusau @ 10:53 am

Wukong, Bringing Ruby to Hadoop

From the post:

Wukong is hands down the simplest (and probably the most fun) tool to use with hadoop. It especially excels at the following use case:

You’ve got a huge amount of data (let that be whatever size you think is huge). You want to perform a simple operation on each record. For example, parsing out fields with a regular expression, adding two fields together, stuffing those records into a data store, etc etc. These are called map only jobs. They do NOT require a reduce. Can you imagine writing a java map reduce program to add two fields together? Wukong gives you all the power of ruby backed by all the power (and parallelism) of hadoop streaming. Before we get into examples, and there will be plenty, let’s make sure you’ve got wukong installed and running locally.

Authoring a topic map is more than the final act of assembling the topic map. Any number of pre-assembly steps may be necessary before the final steps. Wukong is one more tool to assist in that process.

NAQ Tree in Your Forest?

Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space Authors: Ming Zhang and Reda Alhajj Keywords: Knn search, High dimensionality, Dimensionality reduction, Indexing, Similarity search

Abstract:

Similarity search (e.g., k-nearest neighbor search) in high-dimensional metric space is the key operation in many applications, such as multimedia databases, image retrieval and object recognition, among others. The high dimensionality and the huge size of the data set require an index structure to facilitate the search. State-of-the-art index structures are built by partitioning the data set based on distances to certain reference point(s). Using the index, search is confined to a small number of partitions. However, these methods either ignore the property of the data distribution (e.g., VP-tree and its variants) or produce non-disjoint partitions (e.g., M-tree and its variants, DBM-tree); these greatly affect the search efficiency. In this paper, we study the effectiveness of a new index structure, called Nested-Approximate-eQuivalence-class tree (NAQ-tree), which overcomes the above disadvantages. NAQ-tree is constructed by recursively dividing the data set into nested approximate equivalence classes. The conducted analysis and the reported comparative test results demonstrate the effectiveness of NAQ-tree in significantly improving the search efficiency.

Although I think the following paragraph from the paper is more interesting:

Consider a set of objects O = {o1 , o2 , . . . , on } and a set of attributes A = {a1 , a2 , . . . , ad }, we first divide the objects into groups based on the first attribute a1 , i.e., objects with same value of a1 are put in the same group; each group is an equivalence class [23] with respect to a1 . In other words, all objects in a group are indistinguishable by attribute a1 . We can refine the equivalence classes further by dividing each existing equivalence class into groups based on the second attribute a2 ; all objects in a refined equivalence class are indistinguishable by attributes a1 and a2 . This process may be repeated by adding one more attribute at a time until all the attributes are considered. Finally, we get a hierarchical set of equivalence classes, i.e., a hierarchical partitioning of the objects. This is roughly the basic idea of NAQ-tree, i.e., to partition the data space in our similarity search method. In other words, given a query object o, we can gradually reduce the search space by gradually considering the most relevant attributes.

With the caveat that this technique is focused on metric spaces.

But I rather like the idea of reducing the search space by the attributes under consideration. Replace search space with similarity/sameness space and you will see what I mean. Still relevant for searching as well.

COIN-OR

Filed under: Graphs,Software — Patrick Durusau @ 10:35 am

COmputational INfrastructure for Operations Research (COIN-OR)

From the website:

The Computational Infrastructure for Operations Research (COIN-OR**, or simply COIN) project is an initiative to spur the development of open-source software for the operations research community.

Check the related resource page for a number of graph and other software packages.

Whose Your Nearest Neighbor?

Filed under: Hashing,High Dimensionality,Neighbors — Patrick Durusau @ 10:24 am

Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions Authors: Alexandr Andoni and Piotr Indyk

OK, I lied about the title.

You would think there would be short courses on title writing. Maybe I should start offering a one-day seminar in effective title writing.

Anyway, whatever the title issues, this is a deeply fascinating work on detection of nearest neighbors.

The short version is that really close neighbors bump into each other when hashing. So collisions become a way to detect neighbors. Rather clever.

I think of collisions as a basis for identify the same subjects.

Works in metric spaces but topic maps apply to metric spaces as well. After all, subjects are what define and occupy metric spaces.

For the longer explanation, read the paper.

Translate SQL to MongoDB MapReduce

Filed under: MapReduce,MongoDB,SQL — Patrick Durusau @ 10:18 am

Translate SQL to MongoDB MapReduce

There is a growing sense that SQL vs. MapReduce or NoSQL is a question of fitness of the tool for the purpose at hand.

If your problem is best solved by commodity hardware working in parallel, then NoSQL solutions may be the path to take.

I have seen that expressed in a number of ways but not with a lot of detail on what factors drive the choice one way or the other.

With enough detail, that could make both a very good guide and topic map for those faced with this sort of issue.

First noticed on Alex Popescu’s myNoSQL blog.

Yet another MongoDB Map Reduce tutorial – Post

Filed under: MapReduce,MongoDB,NoSQL — Patrick Durusau @ 6:32 am

Yet another MongoDB Map Reduce tutorial

From the post:

As the title says, this is yet-another-tutorial on Map Reduce using MongoDB. But two things that are different here:

1. A problem solving approach is used, so we’ll take a problem, solve it in SQL first and then discuss Map Reduce.

2. Lots of diagrams, so you’ll hopefully better understand how Map Reduce works.

First noticed on Alex Popescu’s myNoSQL blog.

Topic Maps – Human-oriented Semantics? – Slides

Filed under: Examples,TMDM,Topic Maps — Patrick Durusau @ 5:55 am

Topic Maps – Human-oriented Semantics? – Slides

The slides from Lars Marius Garshol’s topic map presentation tomorrow in Sogndal are now online.

Recommended for use with civilians. (Currently non-topic map advocates.)

See The Tin Man for my take away from the presentation.

January 24, 2011

Merge Me Baby One More Time!

Filed under: Merging,R — Patrick Durusau @ 5:41 pm

Merge Me Baby One More Time!

Ok, I admit the title caught my attention. 😉

Covers the use of merge_data.r for quick and dirty merges of a data set that has diverged.

Good to know if you don’t have a situation where the full overhead of a topic map solution is required.

Do note that the article passes over the question of subject identity or the correctness of the merge without even a pause.

That works, but can also mean that when you have forgotten why the data is arranged as it is, well…, that’s life without subject identity.

Gprof2Dot

Filed under: Authoring Topic Maps,Examples,Graphs — Patrick Durusau @ 5:34 pm

Gprof2Dot

Convert profiling output to a dot graph.

This is very cool.

The resulting graph would make an excellent interface into further documentation or analysis powered by a topic map.

Such as other implementations of the same routine? (or improvements thereof?)

Sounds like same subject talk to me.

Ambiguity and Charity

Filed under: Authoring Topic Maps,Subject Identity,Topic Maps — Patrick Durusau @ 9:06 am

John McCarthy Notes on Formalizing Context says in Entering and Leaving Contexts:

Human natural language risks ambiguity by not always specifying such assumptions, relying on the hearer or reader to guess what contexts makes sense. The hearer employs a principle of charity and chooses an interpretation that assumes the speaker is making sense. In AI usage we probably don’t usually want computers to make assertions that depend on principles of charity for their interpretation.

Natural language statements, outside formal contexts, almost never specify their assumptions. And even when they attempt to specify assumptions, such as in formal contexts, it is always a partial specification.

Complete specification of context or assumptions isn’t possible. That would require recursive enumeration of all the information that forms a context and the context of that information and so on.

It really is a question of the degree of charity that is being practiced to resolve any potential ambiguity.

If AI chooses to avoid charity altogether, I think that says a lot about its chances for success.

Topic maps, on the other hand, can specify both the result of the charitable assumption, the subject recognized, as well as the charitable assumption itself. Which could (but not necessarily will be) expressed as scope.

For example, if I see the token who and I specify the scope as being rock-n-roll-bands, that avoids any potential ambiguity, at least from my perspective. I could be wrong, or it could have some other scope, but at least you know my charitable assumption.

What is particularly clever about topic maps is that other users can combine my charitable assumptions with their own as they merge topic maps together.

Think of it as stitching together a fabric of interpretation with a thread of charitable assumptions. A fabric that AI applications will never know.

yEd Graph Editor

Filed under: Graphs,Visualization — Patrick Durusau @ 6:55 am

yEd Graph Editor

From the website:

yEd is a powerful diagram editor that can be used to quickly and effectively generate high-quality drawings of diagrams.

Create your diagrams manually or import your external data for analysis and auto-magically arrange even large data sets by just pressing a button.

Useful for both exploring data set prior to creating a topic map as well as post-creation to visualize a topic map.

Cassandra – New Release

Filed under: Cassandra,NoSQL — Patrick Durusau @ 6:28 am

Cassandra – 0.70 released 2011-01-09.

Homepage reports largest production version has 100 terabytes of data in over 150 machines.

Sounds like a candidate for topic maps. Yes? 😉

Visualizing Social Networks

Filed under: Social Networks,Visualization — Patrick Durusau @ 6:21 am

Visualizing Social Networks

A goldmine of resources on visualizing social networks!

Important for topic maps because if you think about it, all subjects exist in some social context, that is to say a social network.

Visualization can assist in exploring what parts of a social network have or have not been represented in a topic map.

This is a resource that I will be exploring over time.

« Newer PostsOlder Posts »

Powered by WordPress