Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 30, 2010

ANN: Finally! DBpedia and Wikipedia switched to Topic Maps! – News

Filed under: Authoring Topic Maps,CTM,Topic Map Software,Topic Maps,XTM — Patrick Durusau @ 7:07 pm

ANN: Finally! DBpedia and Wikipedia switched to Topic Maps!, according to Lars Heuer.

See his post for the details but if you are capable of installing plugins in a FireFox browser, you can use his DBpedia / Wikipedia -> Topic Maps service within your browser to create topic maps.

The bar for creating topic maps just keeps getting lower!

******
A few minutes later….

Caveat: I am already running FireFox 3.6.6 so your experience may vary, but….this rocks!

Installation of GreaseMonkey and the Mappify browser plugins was very slick (only GreaseMonkey required a restart) and then a quick jaunt to Wikipedia and the first article I pulled up, “rough sets” (that is *sets*), has “Mappify” next to the title and it presents a drop down menu of XTM, CTM and JTM, in that order. Pick one and it offers you the file.

It doesn’t get any slicker than this! Kudos to Lars Heuer!

Scientists Develop World’s Fastest Program to Find Patterns in Social Networks – News

Filed under: RDF,Search Engines,Searching — Patrick Durusau @ 6:56 pm

Scientists Develop World’s Fastest Program to Find Patterns in Social Networks.

Actually the paper title is: COSI: Cloud Oriented Subgraph Identification in Massive Social Networks

Either way, this looks important for topic map fans.

How important?

The authors:

show our framework works efficiently, answering many complex queries over a 778M edge real-world SN dataset derived from Flickr, LiveJournal, and Orkut in under one second.

That important!

If you think about topic maps less as hand curated XML syntax artifacts and more as interactively and probabilistically created mappings into complex subject spaces then the importance of this research becomes even clearer.

June 29, 2010

Name for a topic without a type?

Filed under: Topic Maps — Patrick Durusau @ 4:11 pm

If you have ever wondered what to call a topic without a type, Inge Henriksen has an answer at Simulacrum topics.

Don’t know that I agree with the impact he sees for the loss of type but see what you think.

June 28, 2010

TMQL Tutorials – Announcement

Filed under: Examples,TMQL — Patrick Durusau @ 10:24 am

Topic Maps Lab is releasing a five (5) [sorry, 2010-07-07, reported to be eight (8) parts. I suspect that will change too. 😉 ] part series of tutorial on TMQL!

Will update this list as other parts appear.

If you are logged into Maiana you can do all the exercises there.

The tutorials are in German so either you can improve your technical German, or translate them for yourself and the community.

*****

On a personal note, we have long discussed how somebody ought to do something to better promote topic maps. Well, several people are doing something. A lot of somethings. The question we have to ask ourselves (not others, ourselves), is how we can contribute to those efforts or make other contributions?

MaJorToM 1.0.0 – Release

Filed under: Topic Map Software — Patrick Durusau @ 10:01 am

MaJorToM 1.0.0 (news release), “is a lightweight, merging and flexible Topic Maps engine satisfying different business use cases.” Now available for downloading (software, Google Code Project)!

The most important feature, of many, is that MaJorToM does not require underlying storage adhere to the TMDM data model. Think about that for a moment. How much of the world’s data is stored following the TMDM data model versus other data models? That’s what I thought too.

A more detailed review will follow but for now, download MaJorToM. Today!

PS: You can tell people that MaJorToM supports transactions, monitoring changes, chain of evidence on changes and cool stuff like modeling time and space.

Software, Services & Semantic Technologies – Conference

Filed under: Conferences — Patrick Durusau @ 8:20 am

Software, Services & Semantic Technologies:

S3T 2010 will provide a forum for connecting researchers and international research communities for worldwide dissemination and sharing of ideas and results in the areas of Software and Services and Intelligent Content and Semantics.

If the paper lineup is as strong as the invited speakers, this will be a great event!

Conference Dates: September 11-12, 2010, Varna, Bulgaria.

Early Registration opens July 05, 2010.

June 27, 2010

Looks Like A Topic Map! (Graph Traversal Programming Pattern)

Filed under: Graphs — Patrick Durusau @ 5:17 am

The Graph Traversal Programming Pattern at slide 41 (42 of 76) has an uncanny resemblance to a topic map. You be the judge.

The NoSQL/graph database movement have serious implications for topic maps. Good serious implications for topic maps.

The performance numbers and capabilities are going to get the attention of mainstream consumers and developers.

Now would be a good time to add subject identity as a bit of icing as it were to the NoSQL cake.

Closer to home as it were, graph databases look like a better fit for topic maps processing (say than MySQL).

June 26, 2010

Semantic Compression

Filed under: Cataloging,Indexing,Semantic Diversity — Patrick Durusau @ 12:55 pm

It isn’t difficult to find indexing terms to represent documents.

But, whatever indexing terms are used, a large portion of relevant documents will go unfound. As much as 80% of the relevant documents. See Size Really Does Matter… (A study of full text searching but the underlying problem is the same: “What term was used?”)

You read a document, are familiar with its author, concepts, literature it cites, the relationships of that literature to the document and the relationships between the ideas in the document. Now you have to choose one or more terms to represent all the semantics and semantic relationships in the document. The exercise you are engaged in is compressing the semantics in a document into one or more terms.

Unlike data compression, a la Shannon, the semantic compression algorithm used by any user is unknown. We know it isn’t possible to decompress an indexing term to recover all the semantics of a document it purports to represent. Since a term is used to represent several documents, the problem is even worse. We would have to decompress the term to recover the semantics of all the documents it represents.

Even without the algorithm used to assign indexing (or tagging) terms, investigation of semantic compression could be useful. For example, encoding the semantics of a set of documents (to a set depth) and then asking groups of users to assign those documents indexing or tagging terms. By varying the semantics in the documents, it may, emphasis on may, be possible to experimentally derive partial semantic decompression for some terms and classes of users.

June 25, 2010

Mappify – DBpedia and Wikipedia to Topic Maps – New Service

Filed under: Authoring Topic Maps,Topic Map Software,Topic Maps — Patrick Durusau @ 4:00 pm

Mappify – DBpedia and Wikipedia to Topic Maps is the latest shot over the Topic Maps Lab bow! 😉

Toss a Wikipedia or DBpedia source at this service and get back a topic map! In one of four flavors: xtm, ctm, json, or jtm.

UPDATE: 26 June 2010 – The bug reported below has been fixed!

Warning You must use correct case for URLs.

Incorrect usage: Wikipedia URL for Marilyn Monroe, http://en.wikipedia.org/wiki/Marilyn_monroe. I did not notice that the page says: “(Redirected from Marilyn monroe).” Not much of a re-direct if it leaves me with the incorrect URL.

Correct the case on entries to match the page title above the redirect notice and you will be fine.

Correct usage: http://en.wikipedia.org/wiki/Marilyn_Monroe.

For those unfamiliar with our community, the “competition” between Semagia (Lars Heuer) and the Topic Maps Lab is entirely friendly. It just makes better copy to portray them as fierce competitors leap frogging each other with topic map technologies and resources.

June 24, 2010

27th International Conference on Machine Learning (ICML 2010) – Proceedings

Filed under: Conferences,Data Mining — Patrick Durusau @ 7:09 pm

Proceedings of the 27th International Conference on Machine Learning (ICML 2010) are available.

If you are interested in the next generation of assistive tools for authoring topic maps or using them before your competition does, it would be hard to find a better starting place.

One of my interests is in text archives, so interactive construction of a topic map with an application that searches an archive for subjects or relationships for subjects would be cool. Perhaps that “learns” your preferences as you accept or reject its suggestions. And that “knows” what others have found building topic maps for the same archive. You can follow or not follow their paths into the archive.

June 23, 2010

Balisage: Final Program!

Filed under: Conferences — Patrick Durusau @ 6:46 pm

Balisage Schedule with Latebreaking Sessions is now available.

Truly an awesome lineup! I will miss the conference for the first time in a decade but let me suggest a couple of don’t miss opportunities:

  • Reverse modeling for domain-driven engineering of publishing technology, Anne Brüggemann-Klein, Tamer Demirel, Dennis Pagano, & Andreas Tai, Technische Universität München. Anytime Anne talks about meta-models it is a must see event. You will not be disappointed.
  • Extension of the type/token distinction to document structure, Claus Huitfeldt, University of Bergen Yves Marcoux, Université de Montréal, & C. M. Sperberg-McQueen, Black Mesa Technologies. Hearing a sane discussion of anything from C. S. Peirce is going to be a treat. Claus and company are the ones to deliver it.
  • A streaming XSLT processor, Michael Kay, Saxonica. Get your technical boots out but while listening, think of using streaming XSLT for subject recognition. Another tool we won’t have to build.
  • IPSA RE: A New Model of Data/Document Management, Defined by Identity, Provenance, Structure, Aptitude, Revision and Events, Walter E. Perry & Fiduciary Automation. I would attend to remind Walter he owes me an email. You should attend because Walter is one of those folks who is going to reshape fiduciary disclosure as we know it. (For the better.)
  • Stone soup, C. M. Sperberg-McQueen, Black Mesa Technologies. Just go. Truly remarkable. You will understand.

Ok, so that’s five (5). I said I was a topic map person, not that I could count.

Which means, I left out 28 presentations that you will be dying to see and that I would try to see if I were there. If I were to list all the ones I want to see, it would just be a copy of the schedule, albeit with some of my funny comments along with it.

August, Montreal, good food, top markup experts, excellent presentations, hall way discussions, what more could you ask for?

Seriously, simply the best markup conference of the year.

Authorities and Vocabularies!

Filed under: Data Source,LCSH,RDF — Patrick Durusau @ 6:12 pm

Authorities and Vocabularies at the Library of Congress offers bulk downloads of some of their authorities and vocabularies. Like the Library of Congress subject headings!

Granted it is in RDF but your topic map application is going to encounter RDF eventually. You may as well develop some experience at incorporating it into your topic map as you would any other subject identification system.

June 22, 2010

Web3 Platform

Filed under: RDF,Topic Map Software,Topic Maps — Patrick Durusau @ 2:59 pm

Networked Planet has released a beta of Web3 Platform with free downloads during the beta period.

In addition to supporting RDF and Topic Maps, the platform also supports Sd-Share (think syndication and synchronization of multiple semantic stores).

I will have to rely on the reports of others on its installation and operation. I don’t have a Windows Server although this might tempt me into getting one.

Unstructured Data or Unmapped Data?

Filed under: Data Mining,Marketing — Patrick Durusau @ 10:55 am

The Wikipedia article on unstructured data makes it clear that data may have a structure, but that “unstructured data” means one not readily recognizable to a computer.

The term unstructured data bothers me because any text has a structure. If it didn’t, we would not be able to read it. It would just be a jumble of symbols. Oh, sorry. Apologies to any AI agents “reading” this post. But that is how traditional computers see a text, just a jumble of symbols.

When people view a text, they see structure, recognize subjects, etc. Moreover, different people can look at the same text and see different structures and/or subjects.

There are topic maps that are written to enforce a “correct” view of a body of data and those are certainly useful in many cases. Topic maps also support users identifying the structures and subjects they see in a text, along side identifications made by others.

The extent to which users view texts and leave trails as it were of the structures and subjects they identified in a text (or body of texts), those trails form maps that can be useful to others.

Think of it as tagging but with explicit subject identity. The relationships to a particular text, its author, and a variety of other details could be extracted automatically and with a minimum of effort on the part of the user. A topic map application could even suggest subjects or associations for a user to confirm based on their reading.

Suggest: unmapped data.

Captures both the sense of exploration as well as allowing for multiple mappings.

Thoughts?

June 21, 2010

Looking for the stranger next door – Report

Filed under: Semantic Diversity,Usability — Patrick Durusau @ 6:02 pm

In Looking for the stranger next door Bernard Vatant states what is probably a universal user requirement: Show me what I don’t know about subject X.

Bernard has some interesting ideas on how a system might try to meet that challenge. But for the details, see his post.

June 20, 2010

“What Is I.B.M’s Watson?” – Review

Filed under: Data Mining,Semantic Diversity,Subject Identity — Patrick Durusau @ 7:34 pm

What Is I.B.M.’s Watson? appears in the New York Time Magazine on 20 June 2010. IBM or more precisely David Ferrucci and his team at IBM have made serious progress towards a useful question-answering machine. (On Ferrucci see, Ferrucci – DBLP, Ferrucci – Scientific Commons)

It won’t spoil the article to say that raw computing horsepower (BlueGene servers) plays a role in the success of the Watson project. But, there is another aspect of the project that makes it relevant to topic maps.

Rather than relying on a few algorithms to analyze questions, Watson uses more than a hundred and as summarized by the article:

Another set of algorithms ranks these answers according to plausibility; for example, if dozens of algorithms working in different directions all arrive at the same answer, it’s more likely to be the right one. In essence, Watson thinks in probabilities. It produces not one single “right” answer, but an enormous number of possibilities, then ranks them by assessing how likely each one is to answer the question.

Transpose that into a topic maps setting and imagine that you are using probabilistic merging algorithms that are applied interactively by a user in real time.

Suddenly we are not talking about a technology for hand curated information resources but an assistive technology that would enable human users go deep knowledge diving into the sea of information resources. While generating buoys and markers for others to follow.

Our ability to do that will depend on processing power, creative use and development of “probabilistic merging” algorithms and a Topic Maps Query Language that supports querying of non-topic map data and creation of content based on the results of those queries.

****

PS: For more information on the Watson project, see: What Is Watson?, part of IBM’s DeepQA project.

June 19, 2010

Demonstrating The Need For Topic Maps

Individual Differences in the Interpretation of Text: Implications for Information Science by Jane Morris demonstrates that different readers have different perceptions of lexical cohesion in a text. About 40% worth’s of difference. That is a difference in the meaning of the text.

Many tasks in library and information science (e.g., indexing, abstracting, classification, and text analysis techniques such as discourse and content analysis) require text meaning interpretation, and, therefore, any individual differences in interpretation are relevant and should be considered, especially for applications in which these tasks are done automatically. This article investigates individual differences in the interpretation of one aspect of text meaning that is commonly used in such automatic applications: lexical cohesion and lexical semantic relations. Experiments with 26 participants indicate an approximately 40% difference in interpretation. In total, 79, 83, and 89 lexical chains (groups of semantically related words) were analyzed in 3 texts, respectively. A major implication of this result is the possibility of modeling individual differences for individual users. Further research is suggested for different types of texts and readers than those used here, as well as similar research for different aspects of text meaning.

I won’t belabor what a 40% difference in interpretation implies for the one interpretation of data crowd. At least for those who prefer an evidence versus ideology approach to IR.

What is worth belaboring is how to use Morris’ technique to demonstrate such differences in interpretation to potential topic map customers. As a community we could develop texts for use with particular market segments, business, government, legal, finance, etc. An interface to replace the colored pencils used to mark all words belonging to a particular group. Automating some of the calculations and other operations on the resulting data.

Sensing that interpretations of texts vary is one thing. Having an actual demonstration, possibly using texts from a potential client, is quite another.

This is a tool we should build. I am willing to help. Who else is interested?

Compact RDF to Topic Maps (CRTM)

Filed under: Mapping,RDF,Topic Maps — Patrick Durusau @ 9:52 am

Compact RDF to Topic Maps (CRTM) is a draft mapping from RDF to Topic Maps from Lars Heuer.

CRTM mappings are re-usable, which according to Lars is not possible with other RDF syntaxes. (Why would anyone re-use a mapping when they can re-invent it? Re-invention is much safer than actual progress.)

I don’t know if Lars Heuer and the Topic Maps Lab are competing to see who can release the most interesting topic map software/tools but they are making the rest of us look like laggards. 😉

Or have I missed something really cool/important that others have released recently?

June 18, 2010

TMQL4J suite 2.6.3 Released

Filed under: Search Engines,TMQL,Topic Map Software — Patrick Durusau @ 8:31 am

The Topic Maps Lab is becoming a hotbed of topic map software development.

TMQL4J 2.6.3 was released this week with the following features:

    New query factory – now it is possible to implement your own query types. If the query provides a transformation algorithm, it may be converted to a TMQL query and processed by the tmql4j engine.

  • New language processing – the two core modules ( the lexical scanner and the parser ) were rewritten to become more flexible and stable. The lexical scanner provides new methods to register your own language tokens ( as language extension ) or your own non-canonical tokens.
  • Default prefix – the engine provides the functionality of defining a default prefix in the context of the runtime. The prefix can be used without a specific pattern in the context of a query.
  • New interfaces – the interfaces were reorganized to enable an intuitive usage and understanding of the engine itself.

Plus a plugin architecture with plugins for Tmql4Ontopia, TmqlDraft2010, and TopicMapModificationLanguage. See the announcement for the details.

See also TMQL4J Documentation and Tutorials.

Interested your experiences with the interfaces which “…enable an intuitive usage and understanding of the engine itself.”

June 17, 2010

Online RDF to Topic Maps Converter

Filed under: Mapping,RDF,Topic Map Software,Topic Maps,XTM — Patrick Durusau @ 11:08 am

Lars Heuer has released an online RDF to Topic Maps coversion web service!

Mappify – RDF to Topic Maps is the place!

No need for a topic maps engine or an RDF store. Streams RDF in and Topic Maps format out.

See his post for more details: ANN: Online RDF to Topic Maps Converter.

If you find this interesting, useful, etc., you can find contact information for Lars Heuer at: www.semagia.com

June 16, 2010

Introducing George and Mary

Filed under: Education,Examples,Humor — Patrick Durusau @ 8:40 pm

George and Mary (Background).

Finally! The first installment in my introduction to topic maps for non-technical types arrives!

Suggestions for improving the dialogue, illustrations, etc., are most welcome!

It would be interesting if this could develop as a framework for explaining topic maps and their applicability in particular domains or to particular issues. By changing the problems confronted by George and Mary and adapting the dialogue.

This will not appeal to the “it can’t be funded unless 1) we don’t understand it, and 2) we suspect the applicant doesn’t either” crowd. Ask me if you are in that situation and we can translate a George and Mary story into complicated looking notation. With a light dusting of references to Peirce for good measure.

June 15, 2010

Comparing Models – Exercise

Filed under: Exercises,Subject Identity — Patrick Durusau @ 10:45 am

The Library of Congress record for Meaning and mental representations illustrates why topic maps can be different from other information resources.

The record offers a default display, but also MARCXML, MODS, DUBLINCORE formats.

Each display is unique to that format.

Exercise: Requires pencil/pen, paper, scissors, tape.

Draw 4 unfolded cubes, ;-), just draw double lines across the paper and divide into 4 equal spaces.

Write down one of the values you see on the default page, say the title, Meaning and mental representation.

In the first box to your left (my right), write “Main Title.” Then go to each of the alternative formats and write down what subjects “contain” the title.

First difference, a topic map can treat the containers of subjects as subjects in their own right. (Important for mapping between systems and disclosing that mapping to others.)

Second difference, with the topic “unfolded” as it were, you can either view the other subjects that contain the subject of interest, or, you can cut the cube out and fold it up and display only one set of subjects at a time. You should fill out another set of boxes and make such cubes in preparation for the next difference.

Third difference, assuming that you have cut out two or more cubes and taped them together.

Rotate one of the cubes for a particular piece of information to a different face than the others.

Now we can see “Main Title” in the default system while seeing the author listing in Dublin Core. Our information system has become as heterogeneous as the data that it consumes.

Assignment: Do this exercise for 5 items in the LOC catalog (at least 3 fields) (your choice of items and fields) and prepare to discuss what insights this gives you about the items, their cataloging, the systems for classification or similar themes. Or a theme of your own. This entire area is very much in discovery mode.

Library of Congress LCCN Permalink

Filed under: Subject Identifiers — Patrick Durusau @ 10:09 am

Library of Congress LCCN Permanlinks provide a persistent link to bibliographic records in the Library of Congress catalog.

From the FAQ:

You can use an LCCN Permalink anywhere you need to reference an LC bibliographic record — in emails, blogs, databases, web pages, digital files, etc.

Let’s see how that works: http://lccn.loc.gov/88009251

The internal system maintains its use of the Library of Congress Control Number (the LCCN in the title), which is a unique identifier for that record and allows the outside world access to the same information using a URI.

Question: When I have a work that is identified by a LCCN Permalink and also has an identifier in CiteseerX, DBLP, WorldCat or in a European library, which one should I use?

Question: The FAQ says this link identifies the bibliographic record. Not the same thing as the book it identifies. How should I tell others that I am using the URI to identify a particular book? (Which is not the same thing as the record for that book.)

June 14, 2010

Constructions from Dots and Lines

Filed under: Uncategorized — Patrick Durusau @ 8:06 pm

Constructions from Dots and Lines by Marko A. Rodriguez and Peter Neubauer is an engaging introduction to graphs and why they are important.

Abstract:

A graph is a data structure composed of dots (i.e. vertices) and lines (i.e. edges). The dots and lines of a graph can be organized into intricate arrangements. The ability for a graph to denote objects and their relationships to one another allow for a surprisingly large number of things to be modeled as a graph. From the dependencies that link software packages to the wood beams that provide the framing to a house, most anything has a corresponding graph representation. However, just because it is possible to represent something as a graph does not necessarily mean that its graph representation will be useful. If a modeler can leverage the plethora of tools and algorithms that store and process graphs, then such a mapping is worthwhile. This article explores the world of graphs in computing and exposes situations in which graphical models are beneficial.

June 13, 2010

What Information Goes With Your Subject? Exercise

Filed under: Exercises,Marketing,Topic Maps — Patrick Durusau @ 7:19 pm

A print index does not organize all the information about a subject in one location. It doesn’t even organize all the information in your personal book collection about a subject in one location. It organizes all the information in one book about a subject in one location.

We are no longer subject to that constraint.

But the question is: Without any artificial barriers, what information should go with a subject?

Example: Online maps co-locate information about hotels, convenience stores, bars, etc. with physical locations.

That is a tiny number of the subjects that we see or read about in a week. What would you like to see with those subjects?

Exercise: Every day for the next two weeks, take pencil/pen and paper around with you. At least once per day, twice if you can manage it, write down a subject you want to know more about. Without stopping to think about difficulty, expense, etc., jot down 5 pieces of information you would like to see with that subject.

Extra credit: For extra credit, rank in what order you would like to see the additional information.

June 12, 2010

MURAKAMI Harumi

Filed under: Interface Research/Design,Researchers,Search Interface — Patrick Durusau @ 3:48 pm

MURAKAMI Harumi focuses on knowledge sharing and integration of library catalogs.

ReaD An alternative listing to dblp. DBLP lists four (4) publications, ReaD list six (6) plus fifty (50) papers and notes.

dblp

Homepage

Harumi’s (given name, MURAKAMI is the family name) work on Subject World (Japanese only) (my post on Subject World includes English language references) caught my attention because of its visualization of heterogeneous terminology in a library OPAC setting.

Since I am innocent of any Japanese, I am interested in hearing reactions from those fluent in Japanese to the visualization interface. This could also be an opportunity to explore how visualization preferences do or don’t differ across cultural lines.

The LibraryThing

Filed under: Collocation,Examples,Marketing,Subject Identity — Patrick Durusau @ 3:42 pm

The LibraryThing is the home of OverCat, a collection of 32 million library records.

It is a nifty illustration of re-using identifiers, not re-inventing them.

I put in an ISBN, for example, and the system searches for that work. It does not ask me to create a “cool” URI for it.

It also demonstrates some of the characteristics of a topic map in that it does return multiple matches for all the libraries that hold a work, but only one. (You can still view the other records as well.)

I am not sure I have the time to enter, even by ISBN, all the books that line the walls of my office but maybe I will start with the new ones as they come in and the older ones as I use them. The result is a catalog of my books, but more importantly, additional information about those works entered by others.

Maybe that could be a marketing pitch for topic maps? That topic maps enable users to coordinate their information with others, without prior agreement. Sort of like asking for a ride to town and at the same time, someone in a particular area says they are going to town but need to share gas expenses. (Treating a circumference around a set of geographic coordinates as a subject. Users neither know nor care about the details, just expressing their needs.)

June 11, 2010

JErlang: Erlang with Joins

Filed under: Uncategorized — Patrick Durusau @ 6:18 am

JErlang: Erlang with Joins by Hubert Ploiniczak should interest anyone implementing distributed topic map systems.

The value of having a distributed architecture (did I hear “Internet?”) has been lost on the Semantic Web advocates. With topic maps you can have multiple locations that “resolve” identifiers to other identifiers and pass on information about something that has been identified.

Most existing topic maps look like data silos but that is more a matter of habit than architectural limitation.

I should put in a plug for the Springer Alert Service, which brought the article with the same title, JErlang: Erlang with Joins to my attention. Highly recommended as a way to stay current on the latest CS research. Remember articles don’t have to say “topic map” in the title or abstract to be relevant.

PS: Topic map observations: The final report and article have the same name. In topic maps the different locations for the items would be treated as subject locators, thus allowing them to retain the same name but being distinguished one from the other. Note that the roles differ with the two subjects as well. Susan Eisenbach is the supervisor of the final report and is a co-author of the article reported by Springer.

June 10, 2010

Linked Data and Citation Indexes

Filed under: Linked Data,Subject Identifiers — Patrick Durusau @ 5:46 am

Citation indexes offer a concrete example of why blindly following the linked data mantra of creating “ URIs as names for things” (Linked Data) is a bad idea.

Science Citation Index Expanded ™ by Thompson Reuters offers coverage using citations to identify articles back to 1900. That works because the articles use citations as identifiers to reference previous articles.

There are articles available in digital form, from arXiv.org, CiteSeerX or some other digital repository. That means that they have an identifier in addition to the more traditional citation reference/identifier.

Where multiple identifiers identify the same subject, we need equivalence operators.

Where identifiers already identify subjects, we need operators that re-use those identifiers.

Ask yourself, “What good is a new set of identifiers that partially duplicates existing identifiers?”

If you think you have a good answer, please email me or reply to this post. Thanks!

June 9, 2010

The Fourth Paradigm: Data-intensive Scientific Discovery

Filed under: Uncategorized — Patrick Durusau @ 7:44 pm

Jack Park points to The Forth Paradigm: Data-Intensive Scientific Discovery as a book that merits our attention.

Indeed it does! Lurking just beneath the surface of data-intensive research are questions of semantics. Diverse semantics. How does data-intensive research occur in a multi-semantic world?

Paul Ginsparg (Cornell University), in Text in a Data-centric World, has the usual genuflection towards “linked data” without stopping to consider the cost of evaluating every URI to decide if it is an identifier or a resource. Nor why adding one more name to the welter of names we have now (that is the semantic diversity problem) is going to make our lives any better?

Ginsparg writes:

Such an articulated semantic structure [linked data] facilitates simpler algorithms acting on World Wide Web text and data and is more feasible in the near term than building a layer of complex artificial intelligence to interpret free-form human ideas using some probabilistic approach.

Solving the “perfect language” problem, which has never been solved, is more feasible than “…building a layer of complex artificial intelligence to interpret free-form human ideas using some probabilistic approach” to solve it for us?

Perhaps so but one wonders why that is a useful observation?

On the “perfect language” problem, see The Search for the Perfect Language by Umberto Eco.

Older Posts »

Powered by WordPress