Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 3, 2011

LOS on data.networkedplanet.com – Post

Filed under: Dataset — Patrick Durusau @ 9:22 am

LOS on data.networkedplanet.com opines that http://data.norge.no could be better and outlines some principles as guidance to making it or similar effort better.

Sorry, Networked Planet Blog = Graham Moore and/or Kal Ahmed to most of the topic map regulars.

I am not real sure what LOS stands for…, loan origination solution perhaps? A quick search gives 3.5 million “hits” so I am not going to try to sort it out. Maybe Networked Planet will clear up that mystery in an upcoming post.

I would be more concerned with publication of identifiers, along with when those identifiers should be applied to particular subjects (read properties) than insuring that all identifiers be URLs but then if one is playing to the Semantic Web niche market I suppose that is good advice.

It was just the other day that I mentioned the 100+ million non-URL identifiers that are nearly universally used in chemistry and related fields. I am on the look out for similar, curated sets of identifiers so please post, oh, you know, there is that German publisher that curates chemical structures as search criteria as well. I will go run them down for later this week.

More on the issue of identifier advice to follow.

TinkerPop Updates

Filed under: Blueprints,Gremlin,Pipes,Rexster,Software — Patrick Durusau @ 9:02 am

From the update announcement on 1 March 2011.

Today we bring you a new round of releases. TinkerPop is pleased to announce:

Blueprints 0.5 (Scooby) – https://github.com/tinkerpop/blueprints/wiki/Release-Notes
Pipes 0.3.1 (Mario) – https://github.com/tinkerpop/pipes/wiki/Release-Notes
Gremlin 0.8 (Grem Stefani) – https://github.com/tinkerpop/gremlin/wiki/Release-Notes
Rexster 0.2 (Dog House) – https://github.com/tinkerpop/rexster/wiki/Release-Notes

The graph database work and associated materials is looking more and more attractive.

Look for something specific about applying them to topic maps in the near term.

March 2, 2011

Collaborative Web Search (Haystack)

Filed under: Search Engines,Search Interface — Patrick Durusau @ 1:00 pm

Jeff Dalton reports the launch of Haystack, a collaborative web search startup.

I suspect that while useful within small groups, as shared search results propagate outwards, they will encounter the same semantic dissonance as tagging.

WhistlePig: A minimalist real-time search engine

Filed under: Search Engines,Search Interface — Patrick Durusau @ 12:39 pm

WhistlePig: A minimalist real-time search engine.

From Jeff Dalton’s blog:

William Morgan recently announced the release of Whistlepig, a real-time search engine written in C with Ruby bindings. It is now up to release 0.4. Whistlepig is a minimalist in memory search system with ranking by reverse date. You can read William’s blog post for his motivations for writing it.

Of particular interest (at least to me):

  • A full query language and parser with conjunctions, disjunctions, phrases, negations, grouping, and nesting.
  • Labels: arbitrary tokens which can be added to and removed from documents at any point, and incorporated into search queries.

Health Data Sources

Filed under: Health care — Patrick Durusau @ 11:52 am

The Flowing Data blog mentioned two government sources of health data that appeared recently:

Health.Data.Gov

and,

Health Indicators Warehouse

From the Flowing Data comments, it appears both have some shortcomings, but it is a start.

Intelligence: Practice, Problems and Prospects

Filed under: Examples,Topic Maps — Patrick Durusau @ 10:56 am

Intelligence: Practice, Problems and Prospects Spring 2005 MIT Course on intelligence issues.

I mention this course because intelligence is an area where it is popular to talk about connecting the dots and sharing data.

Note that I said popular to talk about connecting the dots and sharing data.

If news reports are to be credited, always a risky proposition, the US intelligence community is only marginally less Balkanized than it was on 9/11.

Institutional goals and imperatives are more important than any national interest, such as sharing intelligence data, and are likely to remain so.

Promoting topic maps as a means of sharing information in a non-sharing environment, with known imperatives driving the non-sharing, is a losing proposition.

Pitching topic maps to supra-agency leadership is unlikely to succeed, because it requires access to that leadership, a leadership already in the reach of the intelligence Balkan leadership.

Two suggested changes in selling topic maps to the intelligence community:

1) Sell topic maps to individual agencies on the basis they can better integrate their information and information they have gotten from other agencies. Not so much a sharing rhetoric as making the best use of generated and stolen intelligence sort of argument.

2) Remember that US intelligence services aren’t the only intelligence services in the world. It is likely they all suffer from the sort of Balkanization seen in the US but it is also true that some of them may be flexible enough to over come it.

Having successful use of topic maps elsewhere could drive their adoption in the US.

Cussing in Commits – Follow Up Topic Map Project?

Filed under: Dataset,Humor — Patrick Durusau @ 10:24 am

Cussing in Commits: Which Programming Language Inspires the Most Swearing? is a deeply amusing chart based on analysis of one million GitHub commit messages. Tracks the use of profanity in commit messages by programming language for the project.

Oh, the topic map follow up project?

Grab a similar number of commits and create topics and associations. Be imaginative. Create topics for geographic locations of committers. Time of date of commits. Pre or post .0 releases, etc.

Tracking one dimension, such as cussing by language can be amusing. Having the ability to create intersections between dimensions via associations, that could be quite useful. Here is a fun data set to explore.

RKWard

Filed under: Data Mining,R — Patrick Durusau @ 10:22 am

RKWard

Another R IDE for data mining. Thought I should mention it since I also posted a note about RStudio.

From the website:

RKWard is meant to become an easy to use, transparent frontend to the R-language, a very powerful, yet hard-to-get-into scripting-language with a strong focus on statistic functions. It will not only provide a convenient user-interface, however, but also take care of seamless integration with an office-suite. Practical statistics is not just about calculating, after all, but also about documenting and ultimately publishing the results.

RKWard then is (will be) something like a free replacement for commercial statistical packages. In addition to ease of use, three aspects are particularily important:

  • It will be a transparent interface to the underlying R-language. That is, it will not hide the powerful syntax, but merely provide a convenient way, in which both newbies and R-experts can accomplish most of their tasks. A GUI can never provide an interface to the whole power of a language like R. In some cases users will want to tweak some functions to their particular needs and esp. to automate some tasks. By making the “inner workings” visible to the user, RKWard will make it easy for the user to see where and how to use R-syntax to accomplish their goals.
  • For the output, RKWard strives to separate content and design to a high degree. It will not try to design its own tables/graphs, etc, which have to be converted to the style used in the rest of a publication by hand. Currently RKWard uses HTML for its output. Using appropriate style definitions reformatting this output to match the rest of the publication will be easily doable. In future releases RKWard will even seek stronger integration with existing office suites.
  • It relies on a language, that is not only very powerful, but also extensible, and for which dozens of extensions already exist.

And of course, it is free (as in free speech).

RStudio

Filed under: Data Mining,R — Patrick Durusau @ 7:08 am

RStudio

From the website:

RStudio™ is a new integrated development environment (IDE) for R. RStudio combines an intuitive user interface with powerful coding tools to help you get the most out of R.

Productive

RStudio brings together everything you need to be productive with R in a single, customizable environment. Its intuitive interface and powerful coding tools help you get work done faster.

Runs Everywhere

RStudio is available for all major platforms including Windows, Mac OS X, and Linux. It can even run alongside R on a server, enabling multiple users to access the RStudio IDE using a web browser.

Free & Open

Like R, RStudio is available under an open source license that guarantees the freedom to share and change the software, and to make sure it remains free software for all its users.

I first saw this mentioned at:

RStudio: An Open Source and Cross-Platform IDE for R

OSCON Data 2011 Call for Participation

Filed under: Conferences,Data Analysis,Data Mining,Data Models,Data Structures — Patrick Durusau @ 7:07 am

OSCON Data 2011 Call for Participation

Deadline: 11:59pm 03/14/2011 PDT

From the website:

The O’Reilly OSCON Data conference is the first of its kind: bringing together open source culture and data hackers to cover data management at a very practical level. From disks and databases through to big data and analytics, OSCON Data will have instruction and inspiration from the people who actually do the work.

OSCON Data will take place July 25-27, 2011, in Portland, Oregon. We’ll be co-located with OSCON itself.

Proposals should include as much detail about the topic and format for the presentation as possible. Vague and overly broad proposals don’t showcase your skills and knowledge, and our volunteer reviewers aren’t mind readers. The more you can tell us, the more likely the proposal will be selected.

Proposals that seem like a “vendor pitch” will not be considered. The purpose of OSCON Data is to enlighten, not to sell.

Submit a proposal.

Yes, it is right before Balisage but I think worth considering if you are on the West Coast and can’t get to Balisage this year or if you are feeling really robust. 😉

Hmmm, I wonder how a proposal that merges the indexes of the different NoSQL volumes from O’Reilly would be received? You are aware that O’Reilly is re-creating the X-Windows problem that was the genesis of both topic maps and DocBook?

I will have to write that up in detail at some point. I wasn’t there but have spoken to some of the principals who were. Plus I have the notes, etc.

ektorp – Java API for CouchDB

Filed under: CouchDB,NoSQL — Patrick Durusau @ 7:05 am

ektorp – Java API for CouchDB

From the website:

Ektorp is a persistence API that uses CouchDB as storage engine. The goal of Ektorp is to combine JPA like functionality with the simplicity and flexibility that CouchDB provides.

Features

Here are some good reasons why you should consider to use Ektorp in your project:

  • Rich domain models. With powerful JSON-object mapping provided by Jackson it is easy to create rich domain models.
  • Schemaless comfort. As CouchDB is schemaless, the database gets out of the way during application development. With a schemaless database, most adjustments to the database become transparent and automatic.
  • Out-of-the-Box CRUD. The generic repository support makes it trivial to create persistence classes.
  • Simple and fluent API.
  • Spring Support. Ektorp features an optional spring support module.
  • Active development. Ektorp is actively developed and has a growing community.
  • Choice of abstraction level. From full object-document mapping to raw streams, Ektorp will never stop you if you need to step down an abstraction level.

I am going to be looking at this project more closely but it would be interesting to see a project that said:

Here are some reasons to not use this project and/or things it doesn’t do well…

I can’t recall ever seeing a project that had such a disclaimer.

Not that it would have to be long or detailed, but showing an awareness that whatever the project, it isn’t the universal solution would be nice.

Processing.js

Filed under: Graphics,Javascript,Processing,Visualization — Patrick Durusau @ 7:03 am

Processing.js 1.1 has been released.

From the website:

Processing.js is the sister project of the popular Processing visual programming language, designed for the web. Processing.js makes your data visualizations, digital art, interactive animations, educational graphs, video games, etc. work using web standards and without any plug-ins. You write code using the Processing language, include it in your web page, and Processing.js does the rest. It’s not magic, but almost.

Originally developed by Ben Fry and Casey Reas, Processing started as an open source programming language based on Java to help the electronic arts and visual design communities learn the basics of computer programming in a visual context. Processing.js takes this to the next level, allowing Processing code to be run by any HTML5 compatible browser, including current versions of Firefox, Safari, Chrome, Opera, and even the upcoming Internet Explorer 9. Processing.js brings the best of visual programming to the web, both for Processing and web developers.

Everything you need to work with Processing.js is here. You can download the most recent version of Processing.js, read Quick Start Guides for Processing Developers or JavaScript Developers, learn about the Processing language, consult the Reference, and of course view many existing demos that use Processing.js. You can also get involved with the Processing and Processing.js communities, both of which are active and and looking for new users and developers.

If you are not familiar with the Processing project, you need to be.

Even a non-artistic person such as myself can appreciate and follow, if not imitate, the thoughtfulness that has gone into the Processing project.

Very much a project to follow and definitely of interest for visualization in connection with topic maps.

Balisage Deadline Looms…., News at 11!

Filed under: Conferences — Patrick Durusau @ 5:31 am

😉

Seriously, the April 8, 2011 deadline for full papers for Balisage, August 1 – 5, Montreal, Canada is getting closer!

Some helpful links from Tommie Usdin’s reminder of the deadline:

– Paper Selection Criteria: http://www.balisage.net/paper-selection.html

– The proceedings from recent conferences (to give you an idea of what Balisage papers look like) are available from: http://www.balisage.net/Proceedings/index.html

– a tag Library, describing the Balisage tag set http://www.balisage.net/DocumentModels/BalisageTL/index.html You might find the “Full Paper Sample” helpful. You can find it in the left-hand navigation bar.

If you have any questions about Balisage or your Balisage paper, please send email to info@balisage.net.

Personally I would like to see a paper on the design of an online paper submission system that rejects papers on the basis of improper use of the required markup. With a variety of nasty remarks depending on how far the paper departs from the requirements.

The new or innovative part of the paper being a ranking of departures from the required tag set with appropriate responses.

Ranging I suppose from: “Does your mother know you are using valuable bandwidth to bother us?” to “Close, but no prize (or submission to the conference).”

Sorry, I digress.

Do get your papers in by the April 8, 2011 deadline for the Balisage conference!

It is simply the best markup conference of the year.

March 1, 2011

Social Data and Log Analysis Using MongoDB

Filed under: Data Mining,Log Analysis,MongoDB — Patrick Durusau @ 11:33 am

Social Data and Log Analysis Using MongoDB

Interesting use of MongoDB.

Work through the slide deck and consider the following questions along the way:

  1. How would your analysis of the logs (the process of analysis) be different if you were using topic maps?
  2. How would your results from #1 be different?
  3. Choose a set of logs and test your answers to #1 and #2.

(Credit will be equally rewarded whether #3 confirms or contradicts your analysis in #1 and #2. The purpose of the exercise is to develop a “fee” for fruitful areas of investigation.)

NoSQL Databases: Why, what and when

NoSQL Databases: Why, what and when by Lorenzo Alberton.

When I posted RDBMS in the Social Networks Age I did not anticipate returning the very next day with another slide deck from Lorenzo. But, after viewing this slide deck, I just had to post it.

It is a very good overview of NoSQL databases and their underlying principles, with useful graphics as well (as opposed to the other kind).

I am going to have to study his graphic technique in hopes of applying it to the semantic issues that are at the core of topic maps.

cablegate.core 0.2.0-20110224

Filed under: Authoring Topic Maps,Examples,Topic Map Software,Topic Maps — Patrick Durusau @ 10:48 am

Did not mean to miss the updated release of cablegate.core yesterday.

Download a copy and post your comments/suggestions.

Better yet, contribute your analysis via topic maps that can be merged with other topic maps.

Use topic maps to make cablegate more than a titillating annoyance.

The thought occurs to me that with all the unrest in Libya, there could be a fresh crop of diplomatic cables about to become available.

And why not? It would be a nice window into the recent history in the region.

Would that endanger some actors?

Well, you know what they say about playing in the street.

And, they weren’t acting in anyone’s interest but their own, so I would not lose any sleep over it.

InTech – Open Access Publisher

Filed under: Books,Data Mining,Self-Organizing — Patrick Durusau @ 10:18 am

I scan lightly before I clean out my spam filter for the blog and saw:

Hello. Yesterday I found two new books about Data mining. These series of books entitled by ‘Data Mining’ address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters.The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. Books are: “New Fundamental Technologies in Data Mining” here http://www.intechopen.com/books/show/title/new-fundamental-technologies-in-data-mining & “Knowledge-Oriented Applications in Data Mining” here http://www.intechopen.com/books/show/title/knowledge-oriented-applications-in-data-mining These are open access books so you can download it for free or just read on online reading platform like I do. Cheers!

I was curious enough to follow the links and was glad I did.

InTech – Open Access Publisher has a number of volumes for downloading that may interest topic mappers. For free!

At first I thought these were article collections, made up of conference and other papers. I have only spot checked Self Organizing Maps – Applications and Novel Algorithm Design, edited by Josphat Igadwa Mwasiagi, but none of the paper titles appear in web searches, other than at Intechweb.org.

Apologies for appearing suspicious but there is so much re-cycled content on the WWW these days. That does not appear to be the case here, which is welcome news!

Would appreciate hearing of the experience of others with volumes from this site.

30 High Quality Charts And Graphs For Webdevelopers To Download

Filed under: Graphs,Interface Research/Design — Patrick Durusau @ 10:16 am

30 High Quality Charts And Graphs For Webdevelopers To Download

Unless you have religious convictions about delivery of topic map content in user unfriendly terms, you will probably find one or more useful packages here.

Interested to hear which ones you find the best and for what purposes.

String Syntax – Post

Filed under: String Matching,TMQL — Patrick Durusau @ 10:13 am

String Syntax

Since TMQL discussions are starting up, it seem appropriate to point out at least one resource on string syntax.

BTW, I am assuming that the TMQL draft will not have any fewer capabilities than any of the extant implementations?

Is anyone assuming differently?

TMQL Slides for Prague 2011

Filed under: TMQL,TMRM,Topic Maps — Patrick Durusau @ 10:12 am

TMQL Slides for Prague 2011

TMQL slides with discussion points for Prague have been posted!

Please review even if you don’t plan on attending the Prague meeting to offer your comments and questions.

Comments and questions I am sure are always welcome, but are more useful if received prior to weeks if not months of preparing standards prose.

Since I ask, I have several questions (some of which will probably have to be answered post-Prague):

1st Question:

While I understand the utility of the illustrated syntax reflected on the slides, I am more concerned with the underlying formal model for TMQL. Syntax and its explanation for users is very important, but that can take many forms. Can you say a bit more about the underlying formal model that underlies TMQL?

2nd Question:

See my blog post on Indexing by Properties. To what extent is TMQL going to support the use of multiple properties (occurrences) for the purposes of identifications?

3rd Question:

What datatypes will be supported by TMQL? How are additional datatypes declared?

4th Question:

What comparison operators are supported by TMQL?

Indexing by Properties

Filed under: Identifiers,Names,Properties — Patrick Durusau @ 10:09 am

When I was researching the …grain of salt post I happened across the entry for sodium chloride at Wikipedia.

I don’t know how many times I have looked at Wikipedia pages but that day I noticed the headings in the sidebar that read:

IUPAC name (International Union of Pure and Applied Chemistry nomenclature)
Other names
Identifiers
Properties
Structure
Hazards
Related Compounds
Supplementary data page

Think about it for a minute.

Substances don’t arrive in labs, say for example the fictional labs seen on CSI with IUPAC names, other names, or even identifiers.

How are they identified? Can you say by their properties?

Now there is an odd dis-connect between indexing and identification.

That is indexing is by names and identifiers, both of which are known to be weak, rather than by properties.

Now there is an idea, an indexer that marshals properties for any index entry and can report why a particular entry was made.

We would not accept any less from a lab analysis, I wonder why we accept it from our indexers?

Subjects, other than substances, also have properties, including relationships to other subjects.

Identifiers and locators in topic maps are quick and convenient ways to navigate topic maps and the subjects represented therein.

We should now allow that convenience to blind us to the deeper complexity of reliable identification of subjects by their properties.

Indexing based upon more than names and identifiers looks like a largely unexplored landscape and one where topic maps could make an original contribution to the art of indexing.

Well, to be honest, topic maps would be making explicit what indexers have been doing for years. Which would make it even more valuable.

Indexing by Properties. Has a nice ring to it doesn’t it?

Has a number of implications for semantic web technologies, but more on that anon.

« Newer Posts

Powered by WordPress