Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 19, 2011

Control-F

Filed under: Interface Research/Design,Search Interface — Patrick Durusau @ 6:24 am

Dan Russell of Google, notes in Why is search sometimes easy and sometimes hard? Understanding serendipity and expertise in the mind of the searcher, teaching someone to use Control-F to search for text on a page, out performs 16 other changes they made to a search interface. An improvement of 12% in time-to-result measure.

Now for the sad news:

  • 90% of all US internet users do NOT know how to Control-F
  • 50% of all US teachers do NOT know how to Control-F

Two questions:

  1. Does your topic map improve time-to-result by 12% or better?
  2. Do your users know how to use Control-F?

*****
PS: This is a great presentation. I have other comments on it but wanted to single this one out for your attention.

January 18, 2011

MapReduce from the basics to the actually useful (in under 30 minutes) – Post

Filed under: MapReduce — Patrick Durusau @ 9:37 pm

MapReduce from the basics to the actually useful (in under 30 minutes) Mike Miller supplements his appearance on the NoSQL Tapes with this posting on mapreduce.

Said to be followed by a posting with killer visualizations. Will keep watch for it.

Correction: As posted, attributed the posting to “Stefano J Attardi,” the producer of the NoSQL tapes. Corrected 11 February 2011 to attribute the posting to Mike Miller. Apologies.

Hadoop Basics – Post

Filed under: Hadoop,MapReduce,Subject Identity — Patrick Durusau @ 9:27 pm

Hadoop Basics by Carlo Scarioni illustrates the basics of using Hadoop.

When you read the blog post you will know why I selected his post over any number of others.

Questions:

  1. Perform the exercise and examine the results. How accurate are they?
  2. How would you improve the accuracy?
  3. How would you have to modify the Hadoop example to use your improvements in #2?

SciDB 0.75 release

Filed under: SciDB — Patrick Durusau @ 9:05 pm

SciDB 0.75 has just been released.

New features from the release notice:

  • A user query language based on SQL to augment the current functional language
  • Storage management improvements for both dense and sparse data
  • Dynamic query compilation and pipelining for faster execution
  • User-defined types and functions

R1.0 is targeted for May, 2011.

BTW, you will have to register to download the release.

Semantics with Applications: A Formal Introduction

Filed under: Semantics — Patrick Durusau @ 1:27 pm

Semantics with Applications: A Formal Introduction by Hanne Riis Nielson and Flemming Nielson.

From the preface:

Many books on formal semantics begin by explaining that there are three major approaches to semantics, that is

  • operational semantics,
  • denotational semantics, and
  • axiomatic semantics;

but then they go on to study just one of these in greater detail. The purpose of this book is go

  • present the fundamental ideas behind all of these approaches,
  • to stress their relationship by formulating and proving the relevant theorems, and
  • to illustrate the applicability of formal semantics as a tool in computer science.

Not as immediately useful as some of the visualization resources but I do think this book and similar materials are important for making progress with topic maps.

Semantics are with us wherever we turn and have no fixed answer.

The better our semantic skills the more likely we are to achieve temporarily useful answers.
*****
PS: The URL given on the title page for accessing the book, http://daimi.au.dk/~hrn is no longer maintained.

Journalism in the Age of Data

Filed under: Visualization — Patrick Durusau @ 11:12 am

Journalism in the Age of Data

A must see series of videos on data visualization. In addition to a number of compelling examples and interviews, it also offers resources for each chapter of the presentation.

More than a few of the visualizations involved mapping data onto traditional maps.

The mappings that I saw illustrated appeared to all use clean data.

That isn’t a criticism, it certainly makes mapping easier.

But what if I don’t have clean data?

Or, I want to add data from a source that identifies some of the subjects differently?

Those sound more like questions that topic maps are poised to answer.

January 17, 2011

Topic Maps Test Suite (cxtm-tests)

Filed under: CXTM — Patrick Durusau @ 8:47 pm

Topic Maps Test Suite (cxtm-tests) version 0.4 was released today.

From the website:

This is a suite of tests for Topic Maps implementations, based around the various Topic Maps syntaxes. The intention is to help developers of Topic Maps implementations verify that their implementations are actually correct according to the specifications. It can also help customers verify whether a particular implementation is actually standards-conformant.

For each syntax, there are two kinds of tests: canonicalization tests and invalidity tests. The canonicalization tests consist of one (or more, if merging is used) input file in some Topic Maps syntax, and a corresponding canonical output (in CXTM), showing the correct interpretation of the input. Implementations pass if they output a CXTM file identical to the one in the test suite. Invalidity tests are invalid inputs which must be rejected by the implementation.

Your participation is invited, see the cxtm-tests SourceForge Project Page.

Rogue

Filed under: MongoDB,Query Language,Scala — Patrick Durusau @ 8:38 pm

Rogue

From the website:

Rogue is a type-safe internal Scala DSL for constructing and executing find and modify commands against MongoDB in the Lift web framework. It is fully expressive with respect to the basic options provided by MongoDB’s native query language, but in a type-safe manner, building on the record types specified in your Lift models.

Seen on MyNoSQL

*****
PS: To learn more about Lift, see: http://liftweb.net/

Scraping for Journalism: A Guide for Collecting Data

Filed under: Authoring Topic Maps,Data Source — Patrick Durusau @ 3:24 pm

Scraping for Journalism: A Guide for Collecting Data by Dan Nguyen
at ProPublica.

I know, it says Journalism in the title. So just substitute topic map wherever you see journalism.. šŸ˜‰

Scraping is a good way to collect data for topic maps or that other activity.

I saw the reference on FlowingData.com and thought I should pass it on.

DBpedia 3.6 – Release

Filed under: Data Source,DBpedia — Patrick Durusau @ 11:28 am

DBpedia 3.6 – Release

From the announcement:

The new DBpedia dataset describes more than 3.5 million things, of which 1.67 million are classified in a consistent ontology, including 364,000 persons, 462,000 places, 99,000 music albums, 54,000 films, 16,500 video games, 148,000 organizations, 148,000 species and 5,200 diseases.

The DBpedia dataset features labels and abstracts for 3.5 million things in up to 97 different languages; 1,850,000 links to images and 5,900,000 links to external web pages; 6,500,000 external links into other RDF datasets, and 632,000 Wikipedia categories.

The dataset consists of 672 million pieces of information (RDF triples) out of which 286 million were extracted from the English edition of Wikipedia and 386 million were extracted from other language editions and links to external datasets.

Quick Links:

DBpedia MappingTool: a graphical user interface to support the community in creating and editing mappings as well as the ontology.

Improved DBpedia Ontology as well as improved Infobox mappings.

Some commonly used property names changed. Please see http://dbpedia.org/ChangeLog and http://dbpedia.org/Datasets/Properties to know which relations changed and update your applications accordingly!

Download the new DBpedia dataset from http://dbpedia.org/Downloads36

Available as Linked Data and via the DBpedia SPARQL endpoint at http://dbpedia.org/sparql

CS359G: Graph Partitioning and Expanders

Filed under: Graph Partitioning,Graphs — Patrick Durusau @ 8:53 am

CS359G: Graph Partitioning and Expanders

Luca Trevisan‘s course at Standford on Graph Partitioning.

I arrived at this course page after finding the blog for the class: Cs359g.

Graph partitioning is relevant to clustering on the basis of similarity, or as stated in the overview blog post:

Finding balanced separators and sparse cuts arises in clustering problems, in which the presence of an edge denotes a relation of similarity, and one wants to partition vertices into few clusters so that, for the most part, vertices in the same cluster are similar and vertices in different clusters are not. For example, sparse cut approximation algorithms are used for image segmentation, by reducing the image segmentation problem to a graph clustering problem in which the vertices are the pixels of the image and the (weights of the) edges represent similarities between nearby pixels.

Depending upon the properties you are using as the basis for similarity (identity?), graph partitioning is likely to be relevant to your topic map application.

Dark Patterns

Filed under: Interface Research/Design — Patrick Durusau @ 8:36 am

Dark Patterns: User Interfaces Designed to Trick People

I found this site from the blog posting Anti-Patterns and Dark Patterns at Endeca.

The term dark patterns is used to refer to patterns in user interfaces that by design try to serve their own purposes and not those of users.

Such as making canceling a subscription very difficult.

There are a host of other examples at this site.

One legitimate use of such examples would be as contrast to your topic map interface, which is designed to assist users.

Or in the preparation of such a demonstration to alert you that your interface isn’t as helpful as you thought.

Endeca User Interface Design Pattern Library

Filed under: Facets,Interface Research/Design,Navigation,Visualization — Patrick Durusau @ 6:36 am

Endeca User Interface Design Pattern Library

From the website:

p>The Endeca User Interface Design Pattern Library (UIDPL) describes  principled ways to solve common user interface design problems related to search, faceted navigation, and discovery. The library includes both specific UI design patterns as well as pattern topics such as:

  • Search
  • Faceted Navigation
  • Promotional Spotlighting
  • Results Manipulation
  • Faceted Analytics
  • Spatial Visualization

The patterns are offered as proposed sets of design guidelines based on our research and design experience as well as lessons learned from the information search and discovery community. They are NOT the only solutions, strict recipes etched in stone, or a substitute for sound human-centered design practices.

When the week starts off with discovery of a resource like this one, I know it is going to be a good week!

January 16, 2011

QuaaxTM 0.6.0

Filed under: QuaaxTM,Topic Map Software — Patrick Durusau @ 8:25 pm

QuaaxTM

From the website:

QuaaxTM is a PHP Topic Maps engine which implements PHPTMAPI. This enables developers to work against a standardized API. QuaaxTM uses MySQL with InnoDB as storage engine and benefits from transaction support and referential integrity.

PHPTMAPI 2.0.1 released

Filed under: TMAPI,Topic Map Software — Patrick Durusau @ 8:21 pm

PHPTMAPI 2.0.1 released

From the website:

TMAPI/PHP is a programming interface for PHP based on the TMAPI (www.tmapi.org) project. This API will enable PHP developers easy and standardized implementation of topic maps in their applications.

Ontopia

Filed under: Authoring Topic Maps,Ontopia,Topic Map Software — Patrick Durusau @ 6:55 pm

I saw a tweet dated 2011-01-15 saying that Ontopia was alive.

Since Ontopia is a name known to anyone interested in topic maps for more than 30 minutes, I decided to take a look.

It is indeed the Ontopia software for topic maps.

It was disappointing that the homepage, despite being alive! needs updating. Such as not referring to last year’s TMRA conference.

All the additional resources listed are good ones, but the selection is somewhat limited.

One of my goals for 2011 is to develop a bibliography of topic map papers, presentations, etc.

Will have to see how the year goes.

Informer

Filed under: Authoring Topic Maps,Information Retrieval,Searching — Patrick Durusau @ 2:29 pm

The Informer is the newsletter of the BCS Information Retrieval Specialist Group (IRSG).

There is a single issue in 1994, although that is volume 3, which implies there were earlier issues.

A useful source of information on IR.

It would be more useful, if there were an index.

Let’s turn that lack of an index into a topic map exercise:

  1. Select one issue of the Informer.
  2. Create a traditional index for that issue.
  3. Using one or more search engines, create a machine index for that issue.
  4. Create a topic map for that issue.

One purpose of the exercise is to give you a feel for the labor/benefit/delivery characteristics of each method.

The Changing Face of Search – Post

Filed under: Information Retrieval,Search Interface,Searching — Patrick Durusau @ 11:26 am

The Changing Face of Search, Tony Rusell-Rose along with Udo Kruschwitz, and Andy MacFarlane have penned a post about changes they see coming to search.

The entire article is worth your time but one part stood out for me:

… Personalisation does not mean that users will be required to explicitly declare their interests (this is exactly what most users do not want to do!); instead, the search engine tries to infer usersā€™ interests from implicit cues, e.g. time spent viewing a document, the fact that a document that has been selected in preference to another ranked higher in the results list, and so on. Personalised search results can be tailored to individual searchers and also to groups of similar users (ā€œsocial networksā€). (emphasis in original)

[Users don’t want] to explicitly declare their interests.

This has implications for topic map authoring.

Similar to users resisting building explicit document models and/or writing in markup. (Are you listening RDF/RDFa fans?)

Complaining that users don’t want to learn markup, explicitly declare their subjects, or use pre-written RDF vocabularies, is not a solution.

Effective topic map (or other semantic) authoring solutions are going to infer subjects and assist users in correcting its inferences.

January 15, 2011

How To Model Search Term Data To Classify User Intent & Match Query Expectations – Post

Filed under: Authoring Topic Maps,Data Mining,Interface Research/Design,Search Data — Patrick Durusau @ 5:49 pm

How To Model Search Term Data To Classify User Intent & Match Query Expectations by Mark Sprague, courtesy of Searchengineland.com is an interesting piece on analysis of search data to extract user intent.

As interesting as that is, I think it could be used by topic map authors for a slightly different purpose.

What if we were to use search data to classify how users were seeking particular subjects?

That is to mine search data for patterns of subject identification, which really isn’t all that different than deciding what product or what service to market to a user.

As a matter of fact, I suspect that many of the tools used by marketeers could be dual purposed to develop subject identifications for non-marketing information systems.

Such as library catalogs or professional literature searches.

The later being often pay-per-view, maintaining high customer satisfaction means repeat business and work of mouth advertising.

I am sure there is already literature on this sort of mining of search data for subject identifications. If you have a pointer or two, please send them my way.

Membase and Erlang with Matt Ingenthron

Filed under: Erlang,Membase,NoSQL — Patrick Durusau @ 5:39 pm

Membase and Erlang with Matt Ingenthron

From Alex Popescu’s MyNoSQL.

The video is fairly poor in terms of seeing the slides. The presentation is worthwhile but be aware that it is more audio than video.

Recommend that you catch Matt Ingenthron’s blog, or other Membase blogs for more information.

Erlang is important for topic maps due to its built in support for concurrency and for live patching of systems in operation.

For further information see Erlang

How to Think about Parallel Programming: Not!

Filed under: Language Design,Parallel Programming,Subject Identity — Patrick Durusau @ 5:04 pm

How to Think about Parallel Programming: Not! by Guy Steele is a deeply interesting presentation on how not to approach parallel programming. The central theme is that languages should provide parallelism transparently, without programmers having to think in parallel.

Parallel processing of topic maps is another way to scale topic map for particular situations.

How to parallel process questions of subject identity is an open and possibly domain specific issue.

Watch the presentation even if you are only seeking an entertaining account of my first program.

How to Choose a Shard Key: The Card Game

Filed under: MongoDB,NoSQL — Patrick Durusau @ 2:38 pm

How to Choose a Shard Key: The Card Game Kristina Chodorow’s highly entertaining post on how to evaluate sharding strategies.

I mention this because sharding is likely to become an issue as top map applications grow in size. Evaluating strategies before significant development time and effort are invested is always a good idea.

I also have a weakness for clever explanations that capture the essence of a complex problem in an accessible form.

If you are at all interested in MongoDB, see the rest of her blog entries.

Or, MongoDB: The Definitive Guide by Kristina and Michael Dirolf.

Scaling with MongoDB Video

Filed under: MongoDB,NoSQL — Patrick Durusau @ 2:27 pm

Scaling with MongoDB Video

Kristina Chodorow covers scaling with the MongoDB. Mentioned on Alexander Popescu’s MyNoSQL blog.

Alexander is concerned about the complexity of the autosharding solution.

But high availability requires more than understanding the capabilities of a single database solution.

A firm understanding of the concerns in Philip A Bernstein and Eric Newcomer’s Principles of Transaction Processing and Jim Gray and Andreas Reuter’s Transaction Processing: Concepts and Techniques is a good starting point.

Whether you are planning high availability for a topic map or another application.

Should I Work For Free?

Filed under: Marketing — Patrick Durusau @ 10:38 am

If you have ever wondered about a request to work for free, designer Jessica Hische has a graphic to help you make that decision, Should I Work For Free?

I think topic map folks get asked for free work as often as anyone else and when I saw this at Flowingdata.com I had to mention it.

I started to say it was ironic that Jessica has this elaborate matrix to avoid working for free but then contributes her chart for all of us to enjoy and profit from.

But, that’s not the same is it?

Jessica decided she felt strongly enough to create this chart and then to broadcast it for all to see.

It was done on her terms and while it benefits the rest of us, it surely benefits her as well.

Something to think about.

Regret The Error

Filed under: Authoring Topic Maps,Examples — Patrick Durusau @ 7:18 am

Regret the Error is both a website and book by Craig Silvermar.

From the website:

Regret the Error reports on media corrections, retractions, apologies, clarifications and trends regarding accuracy and honesty in the press. It was launched in October 2004 by Craig Silverman, a freelance journalist and author based in Montreal.

Silvermar’s free accuracy checklist is one that reporters (dare I say bloggers?) would do well to follow.

Silvermar recommends printing and laminating the checklist so you can use it with a dry erase pen to check items off.

Better than not having a checklist at all but that seems suboptimal to me.

For example, in a news operation with multiple reporters:

  • How would an editor discover that multiple reporters were relying on the same sources?
  • Or the same sources across multiple stories?
  • How would reporters avoid having to duplicate the effort of other reporters in verifying basic facts such as names, titles, urls, etc?
  • How would reporters build on the experts, resources, sources already located by other reporters?

Questions:

How would you:

  1. Convert Silvermar’s checklist into a topic map?
  2. How would you associate a particular set of items with a story and their being checked off by a reporter?
  3. What extensions or specifics would you add to the checklist?
  4. What other mechanisms would you want in place for such a topic map? (Anonymity for sources comes to mind.)

January 14, 2011

MongoSV 2010

Filed under: MongoDB,NoSQL — Patrick Durusau @ 5:43 pm

MongoSV 2010

One day event on the MongoDB database and its uses.

Reported by Alexander Popescu.

I report it here so you can start working your own way through the four tracks for items of interest.

I am going to do the same and pull out items that strike me as particularly relevant to topic maps.

Feel free to post your own suggestions for must see items.

MongoDB and Eventbriteā€™s Social Graph – Post

Filed under: MongoDB,NoSQL — Patrick Durusau @ 5:34 pm

MongoDB and Eventbriteā€™s Social Graph

Via Alexander Popescu.

Read the post, then grab the slides:

Eventbrite Social Graph slides

This is very cool!

Redis Under the Hood

Filed under: NoSQL,Redis,Topic Maps — Patrick Durusau @ 5:25 pm

Redis Under the Hood

Via Alexander Popescu’s MyNoSQL blog.

Compelling examination of how Redis works. Some the diagrams are look a lot like diagramming subjects and relationships between them.

Take a look and judge for yourself.

Suggests 1,002 uses for topic maps doesn’t it?

Anyway, if you want to get into the internals of Redis, either for the sheer discipline of it or because you think it may figure in your topic map future, here is a good place to start.

The Tin Man

Filed under: Marketing,TMDM,Topic Maps — Patrick Durusau @ 5:08 pm

One of the reasons I suggested having a podcast based topic maps conference is that watching presentations by others always (or nearly so) inspires me with new ideas.

Take Lars Marius Garshol’s presentation this morning, Topic Maps – Human Oriented Semantics?

While musing over the presentation, I was reminded of the line from the Tin Man, …Oz never did give nothing to the Tin Man / That he didn’t, didn’t already have….

If you don’t remember the story, the Wizard of Oz gives the Tin Man a heart, which he obviously had through out the story.

Anyway, I think one take away from Lars’ presentation is that users don’t need to go looking for experts in order to have semantics.

Users already have semantics and topic maps are a particularly clever way for users to express their semantics using their understanding of those semantics.

May or may not fit into classical, neo-classical, rough or fuzzy logic.

What matters is that a topic map represents subjects and their relationships as understood by the users of the topic map.

Users already have semantics, they just need topic maps in order to express them!

Graphs, Networks and Semantics

Filed under: Networks,Semantics — Patrick Durusau @ 4:46 pm

I have been reading a lot of graph and network theory stuff lately.

It occurred to me that in any graph or network, the nodes that we choose to write down, as well as the edges between them, are arbitrary choices on our part.

That is to say that someone else, drawing the same graph or network, might include more or fewer nodes and edges.

Neither one would be more correct than the other, but they would be different networks or graphs.

I mention that because it implies that for every graph or network that we write down, there are other graphs or networks lurking just beyond our reach.

Perhaps within the reach of others, but just not ourselves.

That seems to me to strike at the heart of the notion of primitives in the various logics and ontologies.

There may well be primitives from a particular point of view but only from a particular point of view.

So when someone assures you that a particular set of primitives is required for their semantic solution, be sure you hear that as the limits of their graph, network, or semantics. Your mileage may vary.

« Newer PostsOlder Posts »

Powered by WordPress