Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 27, 2011

When supercomputers meet the Semantic Web – Post

Filed under: Linked Data,Searching,Semantic Web — Patrick Durusau @ 5:59 am

When supercomputers meet the Semantic Web

Jack Park forwarded the link to this post.

It has descriptions like:

Everything about the hardware is optimised to churn through large quantities of data, very quickly, with vital statistics that soon become silly. A single processor “can sustain 128 simultaneous threads and is connected with up to 8 GB of memory.” The Cray XMT comes with at least 16 of those processors, and can scale to over 8,000 of them in order to handle over 1 million simultaneous threads with 64 TB of shared system memory. Should you want to, you could easily hold the entire Linked Data Cloud in main memory for rapid analysis without the usual performance bottleneck introduced by swapping data on and off disks.

Now, that’s computing!

Do note of the emphasis on graph processing.

I think Semantic Web and topic map fans would do well to pay attention to the big data movement mentioned in this article.

Imagine a topic map whose topics emerge in interaction with subject matter experts querying the data as opposed to being statically authored.

Same for associations between subjects and even their association types.

Still topic maps, just a different way to think about authoring them.

I don’t have a Cray XMT but it should be possible to practice emergent topic map authoring on a smaller device.

I rather like that, emergent topic map authoring, ETMA.

Let me push that around a bit and I will post further notes about it.

January 22, 2011

Making Linked Data work isn’t the problem – Post

Filed under: Linked Data,Semantic Web,Topic Maps — Patrick Durusau @ 7:13 am

Making Linked Data work isn’t the problem

Georgi Kobilarov captures an essential question when he says:

But neither Linked Open Data nor the Semantic Web have really took off from there yet. I know many people will disagree with me and point to the famous Linked Open Data cloud diagram, which shows a large (and growing) number of data sets as part of the Linked Data Web. But where are the showcases of problems being solved?

If you can’t show me problems being solved then something is wrong with the solution. “we need more time” is rarely the real issue, esp. when there is some inherent network effect in the system. Then there should be some magic tipping point, and you’re just not hitting it and need to adjust your product and try again with a modified approach.

My point here is not that I want to propose any particular direction or change, but instead I want to stress what I believe is an issue in the community: too few people are actually trying to understand the problem that Linked Data is supposed to be the solution to. If you don’t understand the problem you can not develop a solution or improve a half-working one. Why? Well, what do you do next? Which part to work on? What to change? There is no ground for those decisions if you don’t have at least a well informed guess (or better some evidence) about the problem to solve. And you can’t evaluate your results.

You could easily substitute topic maps in place of linked data in that quote.

Questions:

Putting global claims to one side, write a 5 – 8 page paper, with citations, answering the following questions:

  1. What specific issue in your library would topic maps help solve? As opposed to what other solutions?
  2. Would topic maps require more or less resources than other solutions?
  3. Would topic maps offer any advantages over other solutions?
  4. How would you measure/estimate the answers in #2 and #3 for a proposal to your library board/director?

(Feel free to suggest and answer other questions I have overlooked.)

January 11, 2011

Linked Data Extraction with Zemata and OpenCalais

Filed under: Entity Extraction,Linked Data — Patrick Durusau @ 1:53 pm

Linked Data Extraction with Zemata and OpenCalais

Benjamin Nowack’s review at BNODE of Named Entity Extraction APIs by Zemanta and OpenCalais.

You can brew your own entity extraction routines and likely will for specialized domains. For more general work, or just to become familiar with entity extraction and its limitations, the APIs Benjamin reviews are a good starting place.

December 24, 2010

The OpenLink Data Explorer Extension

Filed under: Linked Data — Patrick Durusau @ 5:03 pm

The OpenLink Data Explorer Extension

Extensions for Firefox, Safari, Google Chrome to explore data underlying web pages.

Must be my lack of business experience but I would think a browser extension would start with the most widely used browser.

I think the claim that Linked Data supports the combination of heterogeneous data without programming, well I suppose that is true.

That is heterogeneous data can be combined using Linked Data, whether it will be meaningful is an entirely different question.

There are some SW based efforts to make such combinations more likely to be useful. More on those anon.

idMesh: Graph-Based Disambiguation of Linked Data

Filed under: Entity Resolution,Linked Data,Topic Maps — Patrick Durusau @ 9:21 am

idMesh: Graph-Based Disambiguation of Linked Data Authors: Philippe Cudré-Mauroux, Parisa Haghani, Michael Jost, Karl Aberer, Hermann de Meer

Abstract:

We tackle the problem of disambiguating entities on the Web. We propose a user-driven scheme where graphs of entities – represented by globally identifiable declarative artifacts – self-organize in a dynamic and probabilistic manner. Our solution has the following two desirable properties: i) it lets end-users freely define associations between arbitrary entities and ii) it probabilistically infers entity relationships based on uncertain links using constraint-satisfaction mechanisms. We outline the interface between our scheme and the current data Web, and show how higher-layer applications can take advantage of our approach to enhance search and update of information relating to online entities. We describe a decentralized infrastructure supporting efficient and scalable entity disambiguation and demonstrate the practicability of our approach in a deployment over several hundreds of machines.

Interesting paper but disappointing in that indication of equivalence between links is the only option for indicating equivalence of entities.

While that is true for Linked Data and the Semantic Web in general (see OWL:sameAs), topic maps has long supported a more robust, declarative approach.

The Topic Maps Data Model (TMDM) defines default merging for topics, but leaves open the specification of additional bases for merging.

The Topic Maps Reference Model (TMRM) does not define any merging rules at all but enables legends to make their own choices with regard to bases for merging.

The problem engendered by indicating equivalence by use of IRIs is just that, all you have is an indication of equivalence.

There is no indication of why, on what basis, etc., two (or more), IRIs are thought to indicate the same subject.

Which means there is no basis on which to compare them with other representatives for the same subject.

As well as no basis for perhaps re-using that declaration of equivalence.

December 19, 2010

OWL, Ontologies, Formats, Punch Cards, Oh My!

Filed under: Linked Data,OWL,Semantic Web,Topic Maps — Patrick Durusau @ 2:03 pm

Edwin Black’s IBM and the Holocaust reports that one aspect of the use of IBM punch card technology by the Nazis (and others) was the monopoly that IBM maintained on the manufacture of the punch cards.

The IBM machines could only use IBM punch cards.

The IBM machines could only use IBM punch cards.

The repetition was intentional. Think about that statement in a more modern context.

When we talk about Linked Data, or OWL, or Cyc, or SUMO, etc. (yes, I am aware that I am mixing formats and ontologies), isn’t that the same thing?

They are not physical monopolies like IBM punch cards but rather are intellectual monopolies.

Say it this way (insert your favorite format/ontology) or you don’t get to play.

I am sure that meets the needs of software designed to work on with particular formats or ontologies.

But that isn’t the same thing as representing user semantics.

Note: Representing user semantics.

Not semantics as seen by the W3C or SUMO or Cyc or (insert your favorite group) or even XTM Topic Maps.

All of those quite usefully represent some user semantics.

None of them represent all user semantics.

No, I am not going to argue there is a non-monopoly solution.

To successfully integrate (or even represent) data, choices have to be made and those will result in a monopoly.

My caution it is to not mistake the lip of the teacup that is your monopoly for the horizon of the world.

Very different things.

*****
PS: Economic analysis of monopolies could be useful when discussing intellectual monopolies. The “products” are freely available but the practices have other characteristics of monopolies. (I have added a couple of antitrust books to my Amazon.com wish list should anyone feel moved to contribute.)

December 17, 2010

CFP – Dealing with the Messiness of the Web of Data – Journal of Web Semantics

CFP – Dealing with the Messiness of the Web of Data – Journal of Web Semantics

From the call:

Research on the Semantic Web, which is now in its second decade, has had a tremendous success in encouraging people to publish data on the Web in structured, linked, and standardized ways. The success of what has now become the Web of Data can be read from the sheer number of triples available within the Linked-Open Data, Linked Life Data and Open-Government initiatives. However, this growth in data makes many of the established assumptions inappropriate and offers a number of new research challenges.

In stark contrast to early Semantic Web applications that dealt with small, hand-crafted ontologies and data-sets, the new Web of Data comes with a plethora of contradicting world-views and contains incomplete, inconsistent, incorrect, fast-changing and opinionated information. This information not only comes from academic sources and trustworthy institutions, but is often community built, scraped or translated.

In short: the Web of Data is messy, and methods to deal with this messiness are paramount for its future.

Now, we have two choices as the topic map community:

  • congratulate ourselves for seeing this problem long ago, high five each other, etc., or
  • step up and offer topic map solutions that incorporate as much of the existing SW work as possible.

I strongly suggest the second one.

Important dates:

We will aim at an efficient publication cycle in order to guarantee prompt availability of the published results. We will review papers on a rolling basis as they are submitted and explicitly encourage submissions well before the submission deadline. Submit papers online at the journal’s Elsevier Web site.

Submission deadline: 1 February 2011
Author notification: 15 June 2011

Revisions submitted: 1 August 2011
Final decisions: 15 September 2011
Publication: 1 January 2012

November 25, 2010

Virtuoso Open-Source Edition

Filed under: Linked Data,RDF,Semantic Web,Software — Patrick Durusau @ 7:06 am

Virtuoso Open-Source Edition

I ran across Virtuoso while running down the references in the article on SIREn. (Yes, I check references, not all of them, just the most interesting ones, as time permits.)

Has partial support for a variety of “Semantic Web” technologies.

Is the basis for OpenLink Data Spaces.

A named structured data cluster within a distributed data network where each item of data (each “datum”) has a unique identifier. Fundamental characteristics of data spaces include:

  • Each Data Item (or Entity) is endowed with a unique HTTP-based Identifier
  • Entity Identity, Access, and Representation are each distinct from the others
  • Entities are interlinked via attributes and relationship properties
  • Creation, Update, and Deletion privileges are controlled by the space owner

I can think of lots of “data spaces,” Large Hadron Collider data, radio and optical astronomy data dumps, TCP/IP data streams, bioinformatics data, commercial transaction databases that don’t fit this description. Please submit your own.

Still, if you want to learn the ins and outs as well as the limitations of this approach, it costs nothing more than the time to download the software.

November 19, 2010

“…an absolute game changer”

Filed under: Linked Data,LOD,Marketing,Semantic Web — Patrick Durusau @ 1:27 pm

Aldo Bucchi write that http://uriburner.com/c/DI463N is:

Single most powerful demo available. Really looking fwd to what’s coming next.

Let’s see how this shifts gears in terms of Linked Data comprehension.
Even in its current state, this is an absolute game changer.

I know this was not easy. My hat goes off to the team for their focus.

Now, just let me send this link out to some non-believers that have
been holding back my evangelization pipeline 😉

I may count as one of the “non-believers.” 😉

Before Aldo throws open the flood gates on his “evagenlization pipeline,” let me observe:

The elderly gentlemen appears in: Tropical grassland, Desert, Temperate grassland, Coniferous forest, Flooded grassland, Mountain grassland, Broadleaf forest, Tropical dry forest, Rainforest, Taiga, Tundra, Urban, Tropical coniferous forests, Mountains, Coastal, and Wetlands.

So he must get around a lot.

Only the BBC appears in Estuaries.

Granting that is a clever presentation of subjects that share a common locale and works fairly responsively but that hardly qualifies as a “…game changer…”

This project is a good experiment on making information more accessible.

Why aren’t the facts enough?

All Identifiers, All The Time – LOD As An Answer?

Filed under: Linked Data,LOD,RDA,Semantic Web,Subject Identity — Patrick Durusau @ 6:25 am

I am still musing over Thomas Neidhart’s comment:

To understand this identifier you would need implicit knowledge about the structure and nature of every possible identifier system in existence, and then you still do not know who has more information about it.

Aside from questions of universal identifier systems failing without exception in the past, which makes one wonder why this system should succeed, there are other questions.

Such as why would any system need to encounter every possible identifier system in existence?

That is the LOD effort has setup a strawman (apologies for the sexism) that it then proceeds to blow down.

If a subject has multiple identifiers in a set and my system recognizes only one out of three, what harm has come of the subject having the other two identifiers?

There is no processing overhead since by admission the system does not recognize the other identifier so it doesn’t process them.

The advantage being that some other system make recognize the subject on the basis of the other identifiers.

This post is a good example of that practice.

I had a category “Linked Data,” but I added a category this morning, “LOD,” just in case people search for it that way.

Why shouldn’t our computers adapt to how we use identifiers (multiple ones for the same subjects) rather than our attempting (and failing) to adapt to universal identifiers to make it easy for our computers?

November 14, 2010

Linked Data Tutorial

Filed under: Linked Data,Semantic Web,Semantics — Patrick Durusau @ 9:36 am

Linked Data Tutorial: “A Practical Introduction by Dr. Michael Hausenblas..

A quick overview of “Linked Data” but not too quick to avoid pointing out some of its issues.

As slide 6 says of the principles: “Many things (deliberately?) kept blurry”

That blurriness carries over into the implementation and semantics of Linked Data.

Linking everything together in a higgly-piggly manner will lead to…, I assume everything being linked together in a higgly-piggly manner.

Once linked together perhaps that will drive refinement of the linking into something useful.

Questions:

  1. List examples of the use of Linked Data in libraries. (3-5 pages, citations/links)
  2. How would you use Linked Data in a library? (3-5 pages, no citations)
  3. What would you change about Linked Data practice or standards? (3-5 pages, citations)
  4. Finding aid on Linked Data for librarians. (3-5 pages, citations)

November 13, 2010

LIMES – LInk discovery framework for MEtric Spaces

Filed under: Linked Data,Semantic Web,Software — Patrick Durusau @ 7:46 am

LIMES – LInk discovery framework for MEtric Spaces

From the website:

LIMES is a link discovery framework for the Web of Data. It implements time-efficient approaches for large-scale link discovery based on the characteristics of metric spaces. It is easily configurable via a web interface. It can also be downloaded as standalone tool for carrying out link discovery locally.

LIMES detects “duplicates” in a single source or between sources by use of string metrics.

The current version of LIMES supports exclusively the string metrics Levenshtein, QGrams, BlockDistance and Euclidian as implements by the SimMetrics library. Further metrics will be included in following versions.

An interesting approach to use as a topic map authoring aid.

Questions:

  1. Using the online LIMES interface, develop and run five (5) link discovery requests. Name and save the result files. Upload them to your class project directory. Be prepared to discuss your requests and results in class.
  2. Sign up to be discussion leader for one of the algorithms supported by LIMES. Prepare a two (2) page summary for the class on your algorithm.
  3. What suggestions would you have for the project on its current UI?
  4. Use LIMES to augment your topic map authoring. Comments? (3-5 pages, no citations)
  5. In an actual run, I got the following as owl:sameAs – http://bio2rdf.org/mesh:D016889 and http://data.linkedct.org/page/condition/4398. Your evaluation? You may follow any links you find to make your evaluation. (2-3 pages, include URLs for other locations that you visit)

November 12, 2010

LOD, Semantic Ambiguity and Topic Maps

Filed under: Authoring Topic Maps,Linked Data,Semantic Web,Topic Maps — Patrick Durusau @ 6:23 pm

The semantic ambiguity of linked data has been a hot topic of discussion of late.

Not only of what linked data links to but of linked data itself!

If you have invested a lot in linked data efforts, don’t panic!

Topic maps, even using XTM/CTM syntaxes, to say nothing of more exotic models, can reduce any semantic ambiguity using occurrences.

If and when it is necessary.

Quite serious, “if and when necessary.”

Err, “if and when necessary” meaning when it is important enough for someone to pay for the disambiguation.

Ambiguity between buyers and sellers of women’s shoes or lingerie probably abounds, but unless someone is willing to pay the freight for disambiguation, it isn’t my concern.

Linked data is exposing the ambiguity of the Semantic Web.

Being unable to solve the semantic ambiguity it exposes, linked data is creating opportunities for topic maps!

Maybe we should send the W3C a fruit basket or something?

November 8, 2010

ISWC 2010 Data and Demos

Filed under: Linked Data,RDF,Semantic Web,SPARQL — Patrick Durusau @ 6:27 am

ISWC 2010 Data and Demos.

Data and demos from the International Semantic Web Conference 2010. Includes links to prior data sets and browsers that work with the data sets.

Data sets are always important as well as being able to gauge the current state of semantic software.

Ambiguity and Linked Data URIs

Filed under: Ambiguity,Linked Data,Marketing,RDF,Semantic Web,Topic Maps — Patrick Durusau @ 6:14 am

I like the proposal by Ian Davis to avoid the 303 cloud while try to fix the mistake of confusing identifiers with addresses in an address space.

Linked data URIs are already known to be subject to the same issues of ambiguity as any other naming convention.

All naming conventions are subject to ambiguity and “expanded” naming conventions, such as a list of properties in a topic map, may make the ambiguity a bit more manageable.

That depends on a presumption that if more information is added and a user advised of it, the risk of ambiguity will be reduced.

But the user needs to be able to use the additional information. What if the additional information is to distinguish two concepts in calculus and the reader is innocent of even basic algebra?

That is that say ambiguity can be overcome only in particular contexts.

But overcoming ambiguity in a particular context may be enough. Such as:

  • Interchange between intelligence agencies
  • Interchange between audited entities and their auditors (GAO, SEC, Federal Reserve (or their foreign equivalents))
  • Interchange between manufacturers and distributors

None of those are the golden age of seamless knowledge sharing and universal democratization of decision making or even scheduling tennis matches sort of applications.

They are applications that can reduce incremental costs, improve overall efficiency and perhaps contribute to achievement of organizational goals.

Perhaps that is enough.

A Guide to Publishing Linked Data Without Redirects – Post

Filed under: Linked Data,RDF,Semantic Web — Patrick Durusau @ 5:34 am

A Guide to Publishing Linked Data Without Redirects is a proposal by Ian Davis to avoid the 303 while distinguishing between “things” and their descriptions.

A step in the right direction.

November 4, 2010

Is 303 Really Necessary? – Blog Post

Filed under: Linked Data,RDF,Semantic Web,Uncategorized — Patrick Durusau @ 9:46 am

Is 303 Really Necessary?.

Ian Davis details at length why 303’s are unnecessary and offers an interesting alternative.

Read the comments as well.

November 3, 2010

Weaknesses In Linked Data

Filed under: Linked Data,RDF,Semantic Web — Patrick Durusau @ 6:47 pm

A Partnership between Structured Data and Ontotext to address weaknesses in linked data framed it this way:

Volumes of linked data on the Web are growing. This growth is exposing three key weaknesses:

  1. inadequate semantics for how to link disparate information together that recognizes inherently different contexts and viewpoints and (often) approximate mappings
  2. misapplication of many linking predicates, such as owl:sameAs, and
  3. a lack of coherent reference concepts by which to aggregate and organize this linkable content.

The amount of linked data is trivial compared to the total volume of digital data.

Makes me wonder about the “only the web will scale argument.”

Questions:

  1. How do these three “key weaknesses” compared to current barriers to semantic integration? (3-5 pages, no citations)
  2. “inadequate semantics?” What’s wrong with the semantics we have now? Or is the point that formal semantics are inadequate? (discussion)
  3. “coherent reference concepts?” How would you recognize one if you saw it? (3-5 pages, no citations)

October 28, 2010

LDSpider

Filed under: Linked Data,Search Engines,Searching,Semantic Web — Patrick Durusau @ 5:11 am

LDSpider.

From the website:

The LDSpider project aims to build a web crawling framework for the linked data web. Requirements and challenges for crawling the linked data web are different from regular web crawling, thus this projects offer a web crawler adapted to traverse and harvest sources and instances from the linked data web. We offer a single jar which can be easily integrated into own applications.

Features:

  • Content Handlers for different formats
  • Different crawling strategies
  • Crawling scope
  • Output formats

Content handlers, crawling strategies, crawling scope, output formats, all standard crawling features. Adapted to linked data formats but those formats should be accessible to any crawler.

A welcome addition since we are all going to encounter linked data but I am missing what is different?

If you see it, please post a comment.

Questions:

  1. What semantic requirements should a web crawler have?
  2. How does this web crawler compare to your requirements?
  3. What one capacity would you add to this crawler?
  4. What other web crawlers should be used for comparison?

October 22, 2010

Linking Enterprise Data

Filed under: Knowledge Management,Linked Data,Semantic Web — Patrick Durusau @ 5:53 am

Linking Enterprise Data, ed. by David Wood. The full text is available in HTML.

Table of Contents:

  • Part I Why Link Enterprise Data?
    • Semantic Web and the Linked Data Enterprise, Dean Allemang
    • The Role of Community-Driven Data Curation for Enterprises, Edward Curry, Andre Freitas, and Sean O’Riain
  • Part II Approval and Support of Linked Data Projects
    • Preparing for a Linked Data Enterprise, Bernadette Hyland
    • Selling and Building Linked Data: Drive Value and Gain Momentum, Kristen Harris
  • Part III Techniques for Linking Enterprise Data
    • Enhancing Enterprise 2.0 Ecosystems Using Semantic Web and Linked Data Technologies: The SemSLATES Approach, Alexandre Passant, Philippe Laublet, John G. Breslin and Stefan Decker
    • Linking XBRL Financial Data, Roberto García and Rosa Gil
    • Scalable Reasoning Techniques for Semantic Enterprise Data, Reza B’Far
    • Reliable and Persistent Identification of Linked Data Elements, David Wood

Comments to follow.

October 13, 2010

Semantic Drift: What Are Linked Data/RDF and TMDM Topic Maps Missing?

Filed under: Linked Data,RDF,Subject Identifiers,Subject Identity,Topic Maps — Patrick Durusau @ 9:38 am

One RDF approach to semantic drift is to situate a vocabulary among other terms.

TMDM topic maps enable a user to gather up information that they considered as identifying the subject in question.

Additional information helps to identify a particular subject. (RDF/TMDM approaches)

Isn’t that the opposite of semantic drift?

What’s happening in both cases?

The RDF approach is guessing that it has the sense of the word as used by the author (if the right word at all).

Kelb reports approximately 48% precision.

So in 1 out of 2 emergency room situations we get the right term? (Not to knock Kelb’s work. It is an important approach that needs further development.)

Topic maps are guessing as well.

We don’t know what information in a subject identifier identifies a subject. Some of it? All of it? Under what circumstances?

Question: What information identifies a subject, at least to its author?

Answer: Ask the Author.

Asking authors what information identifies their subject(s) seems like an overlooked approach.

Domain specific vocabularies with additional information about subjects that indicates the information that identifies a subject versus merely supplemental information would be a good start.

That avoids inline syntax difficulties and enables authors to easily and quickly associate subject identification information with their documents.

Both RDF and TMDM Topic Maps could use the same vocabularies to improve their handling of associated document content.

October 8, 2010

Semantic Drift and Linked Data/Semantic Web

Filed under: Linked Data,OWL,Semantic Web,Subject Identity — Patrick Durusau @ 10:28 am

Overloading OWL sameAs starts with:

Description: General Issue: owl:sameAs is being used in the linked data community in a way that is inconsistent with its semantics.

Read the document but in summary: People use OWL sameAs to mean different things.

I don’t see how their usage can be “inconsistent with its semantics.”

Words don’t possess self-executing semantics that bind us. Rather the other way round I think.

If OWL sameAs had some “original” semantic, it changed by the process of semantic drift.

Semantic drift is where the semantics of a token changes over time or across communities due to its use by people.

URIs or tokens may be “stable,” but the evidence is that the semantics of URIs or tokens are not.

The question is how to manage changing, emerging, drifting semantics? (Not a question answered by a static semantic model of URI based identity.)

PS: RDF researchers have recognized semantic drift and have proposed solutions for addressing it. More on that anon.

Questions:

  • Select a classification more than 30 years old and randomly select one book for each 5 year period for the last 30 years. What (if any) semantic drift do you see in the use of this classification?
  • Exchange your list with a classmate. Do you agree/disagree with their evaluation? Why?
  • Repeat the exercise in #1 and #2 but use a classification where you can find books between 30 and 60 years ago. Select one book per 5 year period.

Library Linked Data: Call for Use Cases

Filed under: Linked Data,Semantic Web — Patrick Durusau @ 6:11 am

Library Linked Data: Call for Use Cases

Just a quick reminder that the call for use cases from the W3C Library Linked Data Incubator Group ends on 15 October 2010.

The mailing archives may be of interest: public-lld@w3.org.

I’m not a fan of “Linked Data” but it will be encountered by topic map authors and so we need to follow its development.

October 5, 2010

Re-Using Linked Data

Filed under: Authoring Topic Maps,Dataset,Linked Data,Topic Maps — Patrick Durusau @ 9:24 am

The German national library released its authority records as linked data.

News and reference services have content management systems that don’t use URIs, so how do they link up public linked data with their private data?

In a way that they can share the resulting linked data within their organization?

Exploration question: What mapping facilities exist in popular CMS systems for mapping linked data to local data?

I don’t know the answer to that but will be finding out.

In the meantime, if you know your CMS system cannot do such a mapping, consider using topic maps. (topicmaps.org)

Topic maps can create linked data that is not subject to the limitation of using URIs.

Grist For Topic Map Mills: German National Library – Authority Files

Filed under: Dataset,Linked Data,RDF,Semantic Web — Patrick Durusau @ 6:09 am

German National Library – Authority Files (linked data)

A post from Lars Svensson announced the release of authority files from the German National Library:

The German National Library (DNB) has published the German library authority files as linked data. The dataset consists of 1.8 Mill differentiated persons from the PND (Personennamendatei, Name authority file), 187.000 subject headings from the SWD (Schlagwortnormdatei, Subject headings authority file), 1.3 Mill corporate bodies from the GKD (Gemeinsame Körperschaftsdatei, Corporate Body Authority file), and 51,000 classes from the German translation of the Dewey Decimal Classification (DDC).

Library students should take particular note of the subject heading and Dewey Decimal Classification materials.

For topic mappers, another set of identifiers that can be mapped between the data sets shown by data cloud as well those that don’t use URIs as identifiers (the vast majority of data).

This will also be of interest to the linked data community.

July 22, 2010

Queries and Linked Data

Filed under: Linked Data,RDF — Patrick Durusau @ 6:30 pm

Federated Data Management and Query Optimization for Linked Open Data by Olaf Görlitz and Steffen Staab and,

A Database Perspective on Consuming Linked Data on the Web by Olaf Hartig and Andreas Langegger,

are two recent publications on querying linked data that will repay close study as we prepare to discuss TMQL in Leipzig.

Linked data is a way of organizing subjects. A way that topic maps will encounter in the (still) heterogeneous world.

June 10, 2010

Linked Data and Citation Indexes

Filed under: Linked Data,Subject Identifiers — Patrick Durusau @ 5:46 am

Citation indexes offer a concrete example of why blindly following the linked data mantra of creating “ URIs as names for things” (Linked Data) is a bad idea.

Science Citation Index Expanded ™ by Thompson Reuters offers coverage using citations to identify articles back to 1900. That works because the articles use citations as identifiers to reference previous articles.

There are articles available in digital form, from arXiv.org, CiteSeerX or some other digital repository. That means that they have an identifier in addition to the more traditional citation reference/identifier.

Where multiple identifiers identify the same subject, we need equivalence operators.

Where identifiers already identify subjects, we need operators that re-use those identifiers.

Ask yourself, “What good is a new set of identifiers that partially duplicates existing identifiers?”

If you think you have a good answer, please email me or reply to this post. Thanks!

June 7, 2010

Datasets Galore! (Data.gov)

Filed under: Data Integration,Data Source,Linked Data,Subject Identity,Topic Maps — Patrick Durusau @ 9:56 am

Data.gov hosts 272,677 datasets.

LinkingOpenData will point you to a 400 subset that is available as “Linked Data.”

I guess that means that the other 272,277 datasets are not “Linked Data.”

Fertile ground for topic maps.

Topic Maps don’t limit users to “[u]se URIs as names for things.” (Linked Data)

A topic map can use the identifiers that are in place in one or more of the 272,277 datasets and create mappings to one or more of the 400 datasets in “Linked Data.”

Without creating “Linked Data” or the overhead of the “303 Cloud.”

Which datasets look the most promising to you?

May 31, 2010

Semantic Web Challenge

The Semantic Web Challenge 2010 details landed in my inbox this morning. My first reaction was to refine my spam filter. 😉 Just teasing. My second and more considered reaction was to think about the “challenge” in terms of topic maps.

Particularly because a posting from the Ontology Alignment Evaluation Initiative arrived the same day, in response to a posting from sameas.org.

I freely grant that URIs that cannot distinguish between identifiers and resources without 303 overhead are poor design. But the fact remains that there are many data sets, representing large numbers of subjects that have even poorer subject identification practices. And there are no known approaches that are going to result in the conversion of those data sets.

Personally I am unwilling to wait until some new “perfect” language for data sweeps the planet and results in all data being converted into the “perfect” format. Anyone who thinks that is going to happen needs to stand with the end-of-the-world-in-2012 crowd. They have a lot in common. Magical thinking being one common trait.

The question for topic mappers to answer is how do we attribute to whatever data language we are confronting, characteristics that will enable us to reliably merge information about subjects in that format either with other information in the same or another data language? Understanding that the necessary characteristics may vary from data language to data language.

Take the lack of a distinction between identifier and resource in the Semantic Web for instance. One easy step towards making use of such data would be to attribute to each URI the status of either being an identifier or a resource. I suspect, but cannot say, that the authors/users of those URIs know the answer to that question. It seems even possible that some sets of such URIs are all identifiers and if so marked/indicated in some fashion, they automatically become useful as just that, identifiers (without 303 overhead).

As identifiers they may lack the resolution that topic maps provide to the human user, which enables them to better understand what subject is being identified. But, since topic maps can map additional identifiers together, when you encounter a deficient identifier, simply create another one for the same subject and map them together.

I think we need to view the Semantic Web data sets as opportunities to demonstrate how understanding subject identity, however that is indicated, is the linchpin to meaningful integration of data about subjects.

Bearing in mind that all our identifications, Semantic Web, topic map or otherwise, are always local, provisional and subject to improvement, in the eye of another.

April 17, 2010

Data 3.0 Manifesto (Reinventing Topic Maps, Almost)

Filed under: Linked Data,Semantic Web,Topic Maps — Patrick Durusau @ 3:54 pm

I happened across Data 3.0 (a Manifesto for Platform Agnostic Structured Data) Update 1.

Kingsley Idehen says:

  • An “Entity” is the “Referent” of an “Identifier.”
  • An Identifier SHOULD provide an unambiguous and unchanging (though it MAY be opaque!) “Name” for its Referent.
  • A Referent MAY have many Identifiers (Names), but each Identifier MUST have only one Referent. (A Referent MAY be a collective Entity, i.e., a Group or Class.)

Sounds like:

  • A proxy represents a subject
  • A proxy can have one or more identifiers for a subject
  • The identitifiers in a proxy have only one referent, the subject the proxy represents

Not quite a re-invention of topic maps as Kingsley’s proposal misses treating entity representatives, and their components, potentially as entities themselves. That can have identifiers, rules for mapping, etc.

“When you can do that, grasshopper, then you will be a topic map.”

« Newer PostsOlder Posts »

Powered by WordPress