Linked Data « Another Word For It

November 26, 2012

Bibliographic Framework as a Web of Data:…

Filed under: BIBFRAME,Library,Linked Data — Patrick Durusau @ 9:53 am

Bibliographic Framework as a Web of Data: Linked Data Model and Supporting Services (PDF)

From the introduction:

The new, proposed model is simply called BIBFRAME, short for Bibliographic Framework. The new model is more than a mere replacement for the library community’s current model/format, MARC. It is the foundation for the future of bibliographic description that happens on, in, and as part of the web and the networked world we live in. It is designed to integrate with and engage in the wider information community while also serving the very specific needs of its maintenance community – libraries and similar memory organizations. It will realize these objectives in several ways:

Differentiate clearly between conceptual content and its physical manifestation(s) (e.g., works and instances)

Focus on unambiguously identifying information entities (e.g., authorities)

Leverage and expose relationships between and among entities

In a web-scale world, it is imperative to be able to cite library data in a way that not only differentiates the conceptual work (a title and author) from the physical details about that work’s manifestation (page numbers, whether it has illustrations) but also clearly identifies entities involved in the creation of a resource (authors, publishers) and the concepts (subjects) associated with a resource. Standard library description practices, at least until now, have focused on creating catalog records that are independently understandable, by aggregating information about the conceptual work and its physical carrier and by relying heavily on the use of lexical strings for identifiers, such as the name of an author. The proposed BIBFRAME model encourages the creation of clearly identified entities and the use of machine-friendly identifiers which lend themselves to machine interpretation for those entities.

An important draft from the Library of Congress on the BIBFRAME proposal.

Please review and comment. (Plus forward to your library friends.)

I first saw this in a tweet by Ivan Herman.

Comments Off

November 12, 2012

LinkedScience.org

Filed under: Linked Data,Science — Patrick Durusau @ 8:46 pm

LinkedScience.org

From the about page:

Linked Science is an approach to interconnect scientific assets to enable transparent, reproducible and transdisciplinary research. LinkedScience.org is a community driven-effort to show what this means in practice.

LinkedScience.org was founded early 2011 and is led by Tomi Kauppinen affiliated with the Institute for Geoinformatics at the University of Muenster (Germany). The term Linked Science was coined in the early paper about Linked Open Science co-authored with Giovana Mira de Espindola from The Brazil’s National Institute for Space Research (INPE) with a reference to LinkedScience.org. At Oxford in March 2011 in discussions between Tomi Kauppinen and Jun Zhao it became evident that a workshop on Linked Science—which was then realized as a collocated event with ISWC 2011 and organized with a big team—would be a perfect start for creating a community for opening and linking science.

Since then LinkedScience.org has grown step by step, or person by person, to include international activities (check the events organized so far), publications about—and related to—Linked Science, the developed vocabularies, tools such as the SPARQL Package for R (please check also the tutorial), and already one sub community, that of spatial@linkedscience to illustrate the benefits and results of linking science.

A large number of resources and projects related to Linked Data and the Sciences.

Comments Off

Machine-Understandable Way

Filed under: Linked Data — Patrick Durusau @ 8:34 pm

I shuddered when I read “…machine-understandable way” in Linked Science Core Vocabulary Specification (revision 0.91).

You could say that cash registers “understand” economics because they add and subtract. Machines carry out rote instructions, with no more “understanding” than a bag of hammers.*

As marketing speak the claim of “machine-understandable” is understandable but dangerous. What if a human eader disagrees with the machine’s conclusion/result? “Machine-understandable” imbues the machine with an understanding it lacks. And it has far more access to information than any human.

I’m sorry, if you disagree with the machine you must be wrong.

If the machine gives a result, who will have the courage to contradict it? Consider something as simple as moronic search results. Airline security personnel who detain four and five year olds with names similar to known terrorists won’t contradict obvious bad search results. How much more will they credit results bases on a “machine-understandable” interchange of data?

Let’s be clear about who has understanding, that would be human users (I leave the potential for sentient aliens to one side), and the responsibility to evaluate results. Whether from a vending machine, cash register or OWL reasoner.

* Yes, I am aware of adaptive programs that reach results not foreseeable by their programmers. But, the adaptive process wasn’t designed by the machine. Rather a human author, someone with “understanding” wrote the relevant instructions.

Comments Off

Linked Science Core Vocabulary Specification

Filed under: Linked Data,Science,Vocabularies — Patrick Durusau @ 8:22 pm

Linked Science Core Vocabulary Specification (revision 0.91)

Abstract:

LSC, the Linked Science Core Vocabulary, is a lightweight vocabulary providing terms to enable publishers and researchers to relate things in science to time, space, and themes. More precisely, LSC is designed for describing scientific resources including elements of research, their context, and for interconnecting them. We introduce LSC as an example of building blocks for Linked Science to communicate the linkage between scientific resources in a machine-understandable way. The “core” in the name refers to the fact that LSC only defines the basic terms for science. We argue that the success of Linked Science—or Linked Data in general—lies in interconnected, yet distributed vocabularies that minimize ontological commitments. More specific terms needed by different scientific communities can therefore be introduced as extensions of LSC. LSC is hosted at LinkedScience.org; please check also other available vocabularies at LinkedScience.org/vocabularies.

A Linked Data vocabulary that you may encounter.

I first saw this in a tweet by Ivan Herman.

Comments Off

November 10, 2012

LDBC: Linked Data Benchmark Council [I count 15 existing Graph/RDF benchmarks. You?]

Filed under: Benchmarks,Graphs,Linked Data — Patrick Durusau @ 11:01 am

LDBC: Linked Data Benchmark Council

From the webpage:

In the last years we have seen an explosion of massive amounts of graph shaped data coming from a variery of applications that are related to social networks like facebook, twitter, blogs and other on-line media and telecommunication networks. Furthermore, the W3C linking open data initiative has boosted the publication and interlinkage of a large number of datasets on the semantic web resulting to the Linked Data Cloud. These datasets with billions of RDF triples such as Wikipedia, U.S. Census bureau, CIA World Factbook, DBPedia, and government sites have been created and published online. Moreover, numerous datasets and vocabularies from e-science are published nowadays as RDF graphs most notably in life and earth sciences, astronomy in order to facilitate community annota- tion and interlinkage of both scientific and scholarly data of interest.

Technology and bandwidth now provide the opportunities for compiling, publishing and sharing massive Linked Data datasets. A significant number of commercial semantic repositories (RDF databases with reasoner and query-engine) which are the cornerstone of the Semantic Web exist.

Neverthless at the present time,

there is no comprehensive suite of benchmarks that encourage the advancement of technology by providing both academia and industry with clear targets for performance and functionality and

no independent authority for developing benchmarks and verifying the results of those engines. The same holds for the emerging field of noSQL graph databases, which share with RDF a graph data model and pattern- and pathoriented query languages.

The Linked Data Benchmark Council (LDBC) project aims to provide a solution to this problem by making insightful the critical properties of graph and RDF data management technology, and stimulating progress through compettion. This is timely and urgent since non-relational data management is emerging as a critical need for the new data economy based on large, distributed, heterogeneous, and complexly structured data sets. This new data management paradigm also provides an opportunity for research results to impact young innovative companies working on RDF and graph data management to start playing a significant role in this new data economy.

This announcement puzzled me because I know I have seen (and written about) graph benchmarks.

A quick search with a popular search engine turned up three of the better known graph benchmarks (in the first ten “hits”):

BHOSLIB: Benchmarks with Hidden Optimum Solutions for Graph Problems (Maximum Clique, Maximum Independent Set, Minimum Vertex Cover and Vertex Coloring) —— Hiding Exact Solutions in Random Graphs by Ke XU, Beijing University of Aeronautics and Astronautics.
HPC Graph Analysis From the homepage:

We maintain a parallel graph theory benchmark that solves multiple graph analysis kernels on small-world networks. An early version of the benchmark was part of the DARPA High Productivity Computing Systems (HPCS) Compact Application (SSCA) suite. The benchmark performance across current HPC systems can be compared using a single score called TrEPS (Traversed Edges Per Second).
Graph 500. From their specifications page:

There are five problem classes defined by their input size:

toy 17GB or around 1010 bytes, which we also call level 10,

mini 140GB (1011 bytes, level 11),

small 1TB (1012 bytes, level 12),

medium 17TB (1013 bytes, level 13),

large 140TB (1014 bytes, level 14), and

huge 1.1PB (1015 bytes, level 15).

On RDF graphs in particular, the W3C wiki page: RDF Store Benchmarking, has a host of resources, including twelve (12) benchmarks for RDF Stores:

Berlin SPARQL Benchmark (BSBM), provides for comparing the performance of RDF and Named Graph stores as well as RDF-mapped relational databases and other systems that expose SPARQL endpoints. Designed along an e-commerce use case. SPARQL and SQL version available.

Lehigh University Benchmark (LUBM) is developed to facilitate the evaluation of Semantic Web repositories in a standard and systematic way. The benchmark is intended to evaluate the performance of those repositories with respect to extensional queries over a large data set that commits to a single realistic ontology.

Ontology Benchmark (UOBM) extends the LUBM benchmark in terms of inference and scalability testing. UOBM ontology and data set.

The SP²Bench SPARQL Performance Benchmark, provides a scalable RDF data generator and a set of benchmark queries, designed to test typical SPARQL operator constellations and RDF data access patterns.

Social Network Intelligence Benchmark (SIB) A benchmark suite developed by people at CWI and Openlink taking the schema from Social Networks for generating test areas where RDF/SPARQL can truly excel, and challenging query processing over highly connected graph.

DBpedia SPARQL Benchmark – Performance Assessment with Real Queries on Real Data.

FedBench – Benchmark for measuring the performance of federated SPARQL query processing (ISWC2011 paper about FedBench).

Linked Data Integration Benchmark (LODIB) is a benchmark for comparing the expressivity as well as the runtime performance of Linked Data translation/integration systems.

JustBench analyses the performance of OWL reasoners based on justifications for entailments (project website).

The ISLab Instance Matching Benchmark, provides for benchmarking instance matching and itentity resolution tools.

THALIA Testbed for testing the expressiveness of relational-to-RDF mapping languages.

A Benchmark for Spatial Semantic Web Systems, by Dave Kolas, extends the LUBM with sample spatial data.

For a total of fifteen (15) graph/RDF benchmarks, discoverable in just a few minutes.

Given the current financial difficulties at the EU, duplicating research already performed/underway by others is a poor investment.

PS: Pass my name along to anyone you know in the EU research approval committees. I would be happy to do new/not-new evaluations of proposals on a contract basis.

PPS: As always, if you know of other graph/RDF benchmarks, I would be happy to add them. I deliberately did not attempt an exhaustive survey of graph or RDF benchmarks. If you or your company are interested in such a survey, ping me.

Comments (2)

IOGDS: International Open Government Dataset Search

Filed under: Dataset,Linked Data,RDF,SPARQL — Patrick Durusau @ 9:21 am

IOGDS: International Open Government Dataset Search

Description:

The TWC International Open Government Dataset Search (IOGDS) is a linked data application based on metadata “scraped” from hundreds of international dataset catalog websites publishing a rich variety of government data. Metadata extracted from these catalog websites is automatically converted to RDF linked data and re-published via the TWC LOGD SPARQL endpoint and made available for download. The TWC IOGDS demo site features an efficient, reconfigurable faceted browser with search capabilities offering a compelling demonstration of the value of a common metadata model for open government dataset catalogs. We believe that the vocabulary choices demonstrated by IOGDS highlights the potential for useful linked data applications to be created from open government catalogs and will encourage the adoption of such a standard worldwide.

In addition to the datasets you will find tutorials, videos, demos, tools and technologies and other resources.

Whether you are looking for Linked Data or Linked Data to re-use in other ways.

Seen in a tweet by Tim O’Reilly.

Comments Off

November 7, 2012

VocBench

Filed under: Linked Data,RDF,SKOS,VocBench — Patrick Durusau @ 5:52 pm

VocBench

From the webpage:

VocBench is a web-based, multilingual, vocabulary editing and workflow tool developed by FAO. It transforms thesauri, authority lists and glossaries into SKOS/RDF concept schemes for use in a linked data environment. VocBench provides tools and functionalities that facilitate the collaborative editing of multilingual terminology and semantic concept information. It further includes administration and group management features as well as built in workflows for maintenance, validation and quality assurance of the data pool.

Current release is (1.3) but 2.0 is due out “Autumn 2012” as open source GPL license.

Another tool that will be of interest to topic map authors.

Comments Off

October 26, 2012

Linked Data Platform 1.0

Filed under: Linked Data,LOD — Patrick Durusau @ 7:05 pm

Linked Data Platform 1.0

From the working draft:

A set of best practices and simple approach for a read-write Linked Data architecture, based on HTTP access to web resources that describe their state using RDF.

Just in case you are keeping up with the Linked Data effort.

I first saw this at Semanticweb.com.

Comments Off

September 14, 2012

ESWC 2013 : 10th Extended Semantic Web Conference

Filed under: BigData,Linked Data,Semantic Web,Semantics — Patrick Durusau @ 1:24 pm

ESWC 2013 : 10th Extended Semantic Web Conference

Important Dates:

Abstract submission: December 5th, 2012

Full paper submission: December 12th, 2012

Authors’ rebuttals: February 11th-12th, 2013

Acceptance Notification: February 22nd, 2013

Camera ready: March 9th, 2013

Conference: May 26th-30th, 2013

From the call for papers:

ESWC is the premier European-based annual conference for researchers and practitioners in the field of semantic technologies. ESWC is the ideal venue for the discussion of the latest scientific insights and novel applications of semantic technologies.

The leading motto of the 10th edition of ESWC will be “Semantics and Big Data”. A crucial challenge that will guide the efforts of many scientific communities in the years to come is the one of making sense of large volumes of heterogeneous and complex data. Application-relevant data often has to be processed in real time and originates from diverse sources such as Linked Data, text and speech, images, videos and sensors, communities and social networks, etc. ESWC, with its focus on semantics, can offer an important contribution to global challenge.

ESWC 2013 will feature nine thematic research tracks (see below) as well as an in-use and industrial track. In line with the motto “Semantics and Big Data”, the conference will feature a special track on “Semantic Technologies for Big Data Analytics in Real Time”. In order to foster the interaction with other disciplines, this year’s edition will also feature a special track on “Cognition and Semantic Web”.

For the research and special tracks, we welcome the submission of papers describing theoretical, analytical, methodological, empirical, and application research on semantic technologies. For the In-Use and Industrial track we solicit the submission of papers describing the practical exploitation of semantic technologies in different domains and sectors. Submitted papers should describe original work, present significant results, and provide rigorous, principled, and repeatable evaluation. We strongly encourage and appreciate the submission of papers including links to data sets and other material used for the evaluation as well as to live demos or source code for tool implementations.

Submitted papers will be judged based on originality, awareness of related work, potential impact on the Semantic Web field, technical soundness of the proposed methods, and readability. Each paper will be reviewed by at least three program committee members in addition to one track chair. This year a rebuttal phase has been introduced in order to give authors the opportunity to provide feedback to reviewers’ questions. The authors’ answers will support reviewers and track chairs in their discussion and in taking final decisions regarding acceptance.

I would call your attention to:

A crucial challenge that will guide the efforts of many scientific communities in the years to come is the one of making sense of large volumes of heterogeneous and complex data.

Sounds like they are playing the topic map song!

Ping me if you are able to attend and would like to collaborate on a paper.

Comments Off

September 11, 2012

Linked Data in Libraries, Archives, and Museums

Filed under: Archives,Library,Linked Data,Museums — Patrick Durusau @ 2:23 pm

Linked Data in Libraries, Archives, and Museums Information Standards Quarterly (ISQ) Spring/Summer 2012, Volume 24, no. 2/3 http://dx.doi.org/10.3789/isqv24n2-3.2012.

Interesting reading on linked data.

I have some comments on the “discovery” of the need to manage “diverse, heterogeneous metadata” but will save them for another post.

From the “flyer” that landed in my inbox:

The National Information Standards Organization (NISO) announces the publication of a special themed issue of the Information Standards Quarterly (ISQ) magazine on Linked Data for Libraries, Archives, and Museums. ISQ Guest Content Editor, Corey Harper, Metadata Services Librarian, New York University has pulled together a broad range of perspectives on what is happening today with linked data in cultural institutions. He states in his introductory letter, “As the Linked Data Web continues to expand, significant challenges remain around integrating such diverse data sources. As the variance of the data becomes increasingly clear, there is an emerging need for an infrastructure to manage the diverse vocabularies used throughout the Web-wide network of distributed metadata. Development and change in this area has been rapidly increasing; this is particularly exciting, as it gives a broad overview on the scope and breadth of developments happening in the world of Linked Open Data for Libraries, Archives, and Museums.”

The feature article by Gordon Dunsire, Corey Harper, Diane Hillmann, and Jon Phipps on Linked Data Vocabulary Management describes the shift in popular approaches to large-scale metadata management and interoperability to the increasing use of the Resource Description Framework to link bibliographic data into the larger web community. The authors also identify areas where best practices and standards are needed to ensure a common and effective linked data vocabulary infrastructure.

Four “in practice” articles illustrate the growth in the implementation of linked data in the cultural sector. Jane Stevenson in Linking Lives describes the work to enable structured and linked data from the Archives Hub in the UK. In Joining the Linked Data Cloud in a Cost-Effective Manner, Seth van Hooland, Ruben Verborgh, and Rik Van de Walle show how general purpose Interactive Data Transformation tools, such as Google Refine, can be used to efficiently perform the necessary task of data cleaning and reconciliation that precedes the opening up of linked data. Ted Fons, Jeff Penka, and Richard Wallis discuss OCLC’s Linked Data Initiative and the use of Schema.org in WorldCat to make library data relevant on the web. In Europeana: Moving to Linked Open Data , Antoine Isaac, Robina Clayphan, and Bernhard Haslhofer explain how the metadata for over 23 million objects are being converted to an RDF-based linked data model in the European Union’s flagship digital cultural heritage initiative.

Jon Voss provides a status on Linked Open Data for Libraries, Archives, and Museums (LODLAM) State of Affairs and the annual summit to advance this work. Thomas Elliott, Sebastian Heath, John Muccigrosso Report on the Linked Ancient World Data Institute, a workshop to further the availability of linked open data to create reusable digital resources with the classical studies disciplines.

Kevin Ford wraps up the contributed articles with a standard spotlight article on LC’s Bibliographic Framework Initiative and the Attractiveness of Linked Data. This Library of Congress-led community effort aims to transition from MARC 21 to a linked data model. “The move to a linked data model in libraries and other cultural institutions represents one of the most profound changes that our community is confronting,” stated Todd Carpenter, NISO Executive Director. “While it completely alters the way we have always described and cataloged bibliographic information, it offers tremendous opportunities for making this data accessible and usable in the larger, global web community. This special issue of ISQ demonstrates the great strides that libraries, archives, and museums have already made in this arena and illustrates the future world that awaits us.”

Comments Off

August 26, 2012

Linked Legal Data: A SKOS Vocabulary for the Code of Federal Regulations

Filed under: Law,Law - Sources,Linked Data,SKOS — Patrick Durusau @ 1:17 pm

Linked Legal Data: A SKOS Vocabulary for the Code of Federal Regulations by Núria Casellas.

Abstract:

This paper describes the application of Semantic Web and Linked Data techniques and principles to regulatory information for the development of a SKOS vocabulary for the Code of Federal Regulations (in particular of Title 21, Food and Drugs). The Code of Federal Regulations is the codification of the general and permanent enacted rules generated by executive departments and agencies of the Federal Government of the United States, a regulatory corpus of large size, varied subject-matter and structural complexity. The CFR SKOS vocabulary is developed using a bottom-up approach for the extraction of terminology from text based on a combination of syntactic analysis and lexico-syntactic pattern matching. Although the preliminary results are promising, several issues (a method for hierarchy cycle control, expert evaluation and control support, named entity reduction, and adjective and prepositional modifier trimming) require improvement and revision before it can be implemented for search and retrieval enhacement of regulatory materials published by the Legal Information Institute. The vocabulary is part of a larger Linked Legal Data project, that aims at using Semantic Web technologies for the representation and management of legal data.

Considers use of nonregulatory vocabularies, conversion of existing indexing materials and finally settles on NLP processing of the text.

Granting that Title 21, Food and Drugs is no walk in the part, take a peek at the regulations for Title 26, Internal Revenue Code. 😉

A difficulty that I didn’t see mentioned is the changing semantics in statutory law and regulations.

The definition of “person,” for example, varies widely depending upon where it appears. Both chronologically and synchronically.

Moreover, if I have a nonregulatory vocabulary and/or CFR indexes, why shouldn’t that map to the CFR SKOS vocabulary?

I may not have the “correct” index but the one I prefer to use. Shouldn’t that be enabled?

I first saw this at Legal Informatics.

Comments Off

August 21, 2012

Putting WorldCat Data Into A Triple Store

Filed under: Library,Linked Data,RDF,WorldCat — Patrick Durusau @ 10:32 am

Putting WorldCat Data Into A Triple Store by Richard Wallis.

From the post:

I can not really get away with making a statement like “Better still, download and install a triplestore [such as 4Store], load up the approximately 80 million triples and practice some SPARQL on them” and then not following it up.

I made it in my previous post Get Yourself a Linked Data Piece of WorldCat to Play With in which I was highlighting the release of a download file containing RDF descriptions of the 1.2 million most highly held resources in WorldCat.org – to make the cut, a resource had to be held by more than 250 libraries.

So here for those that are interested is a step by step description of what I did to follow my own encouragement to load up the triples and start playing.

Have you loaded the WorldCat linked data into a triple store?

Some other storage mechanism?

Comments Off

August 9, 2012

An Introduction to Linked Open Data in Libraries, Archives & Museums

Filed under: Linked Data,LOD — Patrick Durusau @ 3:48 pm

An Introduction to Linked Open Data in Libraries, Archives & Museums by Jon Voss.

From the description:

According to a definition onLinkedData.org, “The term Linked Data refers to a set of best practices for publishing and connecting structured data on the web.” This has enormous implications for discoverability and interoperability for libraries, archives, and museums, not to mention a dramatic shift in the World Wide Web as we know it. In this introductory presentation, we’ll explore the fundamental elements of Linked Open Data and discover how rapidly growing access to metadata within the world’s libraries, archives and museums is opening exciting new possibilities for understanding our past, and may help in predicting our future.

Be forewarned that Jon thinks “mashing up” music tracks has a good result.

And you will encounter advocates for Linked Data in libraries.

You should be prepared to encounter both while topic mapping.

Comments Off

July 29, 2012

Open Services for Lifecycle Collaboration (OSLC)

Filed under: Linked Data,Semantic Web,Standards — Patrick Durusau @ 9:55 am

Open Services for Lifecycle Collaboration (OSLC)

This is one of the efforts mentioned in: Linked Data: Esperanto for APIs?.

From the about page:

Open Services for Lifecycle Collaboration (OSLC) is a community of software developers and organizations that is working to standardize the way that software lifecycle tools can share data (for example, requirements, defects, test cases, plans, or code) with one another.

We want to make integrating lifecycle tools a practical reality. (emphasis in original)

That’s a far cry from:

At the very least, however, a generally accepted approach to linking data within applications that make the whole programmable Web concept more accessible to developers of almost every skill level should not be all that far off from here.

It has an ambitious but well-defined scope, which will lend itself to the development and testing of standards for the interchange of information.

Despite semantic diversity, those are tasks that can be identified and that would benefit from standardization.

There is measurable ROI for participants who use the standard in a software lifecycle. They are giving up semantic diversity in exchange for other tangible benefits.

An effort to watch as a possible basis for integrating older software lifecycle tools.

Comments Off

Linked Data: Esperanto for APIs?

Filed under: Linked Data,RDF,Semantic Web — Patrick Durusau @ 9:40 am

Michael Vizard writes in: Linked Data to Take Programmable Web to a Higher Level:

The whole concept of a programmable Web may just be too important to rely solely on APIs. That’s the thinking behind a Linked Data Working Group initiative led by the W3C that expects to create a standard for embedding URLs directly within application code to more naturally integrate applications. Backed by vendors such as IBM and EMC, the core idea is to create more reliable method for integrating applications that more easily scales by not creating unnecessary dependencies of APIs and middleware.

…

At the moment most of the hopes for a truly programmable Web are tied to an API model that is inherently flawed. That doesn’t necessarily mean that Linked Data approaches will eliminate the need for APIs. But in terms of making the Web a programmable resource, Linked Data represents a significant advance in terms of both simplifying the process of actually integrating data while simultaneously reducing dependencies on cumbersome middleware technologies that are expensive to deploy and manage.

Conceptually, linked data is obvious idea. But getting everybody to agree on an actual standard is another matter. At the very least, however, a generally accepted approach to linking data within applications that make the whole programmable Web concept more accessible to developers of almost every skill level should not be all that far off from here. (emphasis added)

I am often critical of Linked Data efforts so let’s be clear:

Linked Data, as a semantic identification method, has strengths and weaknesses, just like any other semantic identification method. If it works for your particular application, great!

One of my objections to Linked Data is its near religious promotion as a remedy for semantic diversity. I don’t think a remedy for semantic diversity is possible, nor is is desirable.

The semantic diversity in IT is like the genetic diversity in the plant and animal kingdoms. It is responsible for robustness and innovation.

Not the fault of Linked Data but it is often paired with explanations for the failure of the Semantic Web to thrive.

The first Scientific American “puff piece” on the semantic was more than a decade ago now. We suddenly learn that it hasn’t been a failure of user interest, adoption, etc., that have defeated the Semantic Web, but a flawed web API model. Cure that and semantic nirvana is just around the corner.

The Semantic Web has failed to thrive because the forces of semantic diversity are more powerful than any effort at semantic sameness.

The history of natural languages and near daily appearance of new programming languages, to say nothing of the changing semantics of both, are evidence for “forces of semantic diversity.”

To paraphrase Johnny Cash, “do we kick against the pricks (semantic diversity)” or build systems that take it into account?

Comments (1)

July 21, 2012

Stardog 1.0.2 Released

Filed under: Graphs,Linked Data,Stardog — Patrick Durusau @ 7:35 pm

Stardog 1.0.2 Released: NoSQL Graph Database Leading Innovation in Semantic Technologies

From the post:

C&P LLC, the company behind Stardog, today announced the release of Stardog 1.0.2. Stardog is a NoSQL graph database based on W3C semantic web standards: SPARQL, RDF, and OWL. Stardog is a key component in Linked Data-based information integration at Fortune 500 enterprises and governments around the world.

The new release follows closely on last month’s launch of Stardog 1.0. The 1.0.2 release includes Stardog Community, a free version of Stardog for community use in academia, non-profit, and related sectors. Stardog is being used by customers in the areas of government, aerospace, financial, intelligence, defense, and at consumer-oriented startups.

“We are pleased with the technical progress made on Stardog since the 1.0 launch,” said Dr Evren Sirin, CTO, C&P LLC. “Today’s release begins our support for SPARQL 1.1, a crucial standard in enterprise information integration and semantic technologies. It also introduces Stardog Community to a new user base in linked open data, open government data, and related fields.”

I’m not real sure I would want to tie Linked Data or SPARQL to the tail of my product.

Still, there is a fair amount of it and the sheer inertia of government systems will result in more of it and it will always be around as a legacy format. So there isn’t any harm in supporting it, so long as you don’t get tunnel vision from it.

Comments (1)

July 11, 2012

Webtracks

Filed under: Archives,Linked Data,Metadata,RDF — Patrick Durusau @ 2:27 pm

Webtracks

From the webpage:

This project will develop an approach and mechanism to address the construction and propagation of linked data in the context of research and academic endeavour. The proposed work will build experiments in previous projects (Claddier, StoreLink) to develop a peer-to-peer protocol to underpin the construction of a web of linked data. This set of semantically annotated links between data resources forms a graph of citation and provenance and the project will build value added services to exploit these features.

I ran across this in the following description of a presentation at a library conference:

Inter-repository Linking of Research Objects with Webtracks

This session being presented by Shirley Ying Crompton. Shirley describing how the research process leads to research data and outputs being stored in different places with no links between them. So decided to use RDF/linked data to added structured citation links between research objects (and people – e.g. creators).

However, different objects created in different systems – so how to make sure objects are linked as they are created? Looked at existing protocols for enabling links to be created:

Trackbacks – use for blogs/comments

Semantic pingback – an RPC protocol to form semantic links between objects

Salmons – RSS protocol

…

Decided to take ‘webtracks’ approch – this is an inter-repository communication protocol. The Webtracks InteRCom protocol – allows formation of links between objects in two different repositories. InteRCom is two stage protocol – first stage is ‘harvest’ to get links, then second stage ‘request’ a link between two objects.

InteRCom implementation has been done in Java, available as open source – available for download from http://sourceforge.net/projects/webtracks/.

Shirley says: Webtracks facilitates propagation of citation links to provide a linked web of data – uses emerging linked data environment and support linking between diverse types o digital research objects. There are no constraints on link semantics or metadata. Importantly (for the project) is that it does not rely on centralised service – it is peer-to-peer. [Repository Services]

The key insight is “…different objects created in different systems….” (emphasis added)

That condition is, has been and will be true, no matter what solution you decide upon.

Comments Off

July 10, 2012

Linked Media Framework [Semantic Web vs. ROI]

Filed under: Linked Data,RDF,Semantic Web,SKOS,SPARQL — Patrick Durusau @ 11:08 am

Linked Media Framework

From the webpage:

The Linked Media Framework is an easy-to-setup server application that bundles central Semantic Web technologies to offer advanced services. The Linked Media Framework consists of LMF Core and LMF Modules.

LMF Usage Scenarios

The LMF has been designed with a number of typical use cases in mind. We currently support the following tasks out of the box:

Publishing Legacy Data as Linked Data: whenever you have legacy datasets in CSV, Excel, XML or similar and want to publish it as Linked Data, the LMF framework is the right tool for you. Follow this guide to see how.

Building Semantic Search over your Data: you have data in some format and want to enrich it with content from the Linked Data Cloud to provide advanced Semantic Search? Follow this guide!

Using a SKOS Thesaurus for Information Extraction: you have a custom thesaurus and want to analyse and automatically interlink content based on its concepts? See how to do it in this guide.

Target groups are a in particular casual users who are not experts in Semantic Web technologies but still want to publish or work with Linked Data, e.g. in the Open Government Data and Linked Enterprise Data area.

It is a bad assumption that workers in business or government have free time to add semantics to their data sets.

If adding semantics to your data, by linked data or other means is a core value, resource the task just like any other with your internal staff or hire outside help.

A Semantic Web short coming is the attitude that users are interested in or have the time to build it. Assuming the project to be worthwhile and/or doable.

Users are fully occupied with tasks of their own and don’t need a technical elite tossing more tasks onto them. You want the Semantic Web? Suggest you get on that right away.

Integrated data that meets a business need and has proven ROI isn’t the same thing as the Semantic Web. Give me a call if you are interested in the former, not the latter. (I would do the latter as well, but only on your dime.)

I first saw this at semanticweb.com, announcing version 2.2.0 of lmf – Linked Media Framework.

Comments Off

What is Linked Data

Filed under: Government Data,Linked Data,LOD — Patrick Durusau @ 10:37 am

What is Linked Data by John Goodwin.

From the post:

In the early 1990s there began to emerge a new way of using the internet to link documents together. It was called the World Wide Web. What the Web did that was fundamentally new was that it enabled people to publish documents on the internet and link them such that you could navigate from one document to another.

Part of Sir Tim Berners-Lee’s original vision of the Web was that it should also be used to publish, share and link data. This aspect of Sir Tim’s original vision has gained a lot of momentum over the last few years and has seen the emergence of the Linked Data Web.

The Linked Data Web is not just about connecting datasets, but about linking information at the level of a single statement or fact. The idea behind the Linked Data Web is to use URIs (these are like the URLs you type into your browser when going to a particular website) to identify resources such as people, places and organisations, and to then use web technology to provide some meaningful and useful information when these URIs are looked up. This ‘useful information’ can potentially be returned in a number of different encodings or formats, but the standard way for the linked data web is to use something called RDF (Resource Description Framework).

An introductory overview of the rise and use of linked data.

John is involved in efforts at data.gov.uk to provide open access to governmental data and one form of that delivery will be linked data.

You will be encountering linked data, both as a current and legacy format so it is worth your time to learn it now.

I first saw this at semanticweb.com.

Comments Off

July 1, 2012

Cascading map-side joins over HBase for scalable join processing

Filed under: HBase,Joins,Linked Data,LOD,MapReduce,RDF,SPARQL — Patrick Durusau @ 4:45 pm

Cascading map-side joins over HBase for scalable join processing by Martin Przyjaciel-Zablocki, Alexander Schätzle, Thomas Hornung, Christopher Dorner, and Georg Lausen.

Abstract:

One of the major challenges in large-scale data processing with MapReduce is the smart computation of joins. Since Semantic Web datasets published in RDF have increased rapidly over the last few years, scalable join techniques become an important issue for SPARQL query processing as well. In this paper, we introduce the Map-Side Index Nested Loop Join (MAPSIN join) which combines scalable indexing capabilities of NoSQL storage systems like HBase, that suffer from an insufficient distributed processing layer, with MapReduce, which in turn does not provide appropriate storage structures for efficient large-scale join processing. While retaining the flexibility of commonly used reduce-side joins, we leverage the effectiveness of map-side joins without any changes to the underlying framework. We demonstrate the significant benefits of MAPSIN joins for the processing of SPARQL basic graph patterns on large RDF datasets by an evaluation with the LUBM and SP2Bench benchmarks. For most queries, MAPSIN join based query execution outperforms reduce-side join based execution by an order of magnitude.

Some topic map applications include Linked Data/RDF processing capabilities.

The salient comment here being: “For most queries, MAPSIN join based query execution outperforms reduce-side join based execution by an order of magnitude.“

Comments Off

June 29, 2012

DDC 23 released as linked data at dewey.info

Filed under: Classification,Dewey - DDC,Linked Data — Patrick Durusau @ 3:14 pm

DDC 23 released as linked data at dewey.info

From the post:

As announced on Monday at the seminar “Global Interoperability and Linked Data in Libraries” in beautiful Florence, an exciting new set of linked data has been added to dewey.info. All assignable classes from DDC 23, the current full edition of the Dewey Decimal Classification, have been released as Dewey linked data. As was the case for the Abridged Edition 14 data, we define “assignable” as including every schedule number that is not a span or a centered entry, bracketed or optional, with the hierarchical relationships adjusted accordingly. In short, these are numbers that you find attached to many WorldCat records as standard Dewey numbers (in 082 fields), as additional Dewey numbers (in 083 fields), or as number components (in 085 fields).

The classes are exposed with full number and caption information and semantic relationships expressed in SKOS, which makes the information easily accessible and parsable by a wide variety of semantic web applications.

This recent addition massively expands the data set by over 38.000 Dewey classes (or, for the linked data geeks out there, by over 1 million triples), increasing the number of classes available almost tenfold. If you like, take some time to explore the hierarchies; you might be surprised to find numbers for Maya calendar or transits of Venus (loyal blog readers will recognize these numbers).

All the old goodies are still there, of course. Depending on which type of user agent is accessing the data (e.g., a browser) a different representation is negotiated (HTML or various flavors of RDF). The HTML pages still include RDFa markup, which can be distilled into RDF by browser plug-ins and other applications without the user ever having to deal with the RDF data directly.

More details follow but that should be enough to capture your interest.

Good thing there is a pointer for the Maya calendar. Would hate for interstellar archaeologists to think we were too slow to invent a classification number for the disaster that is supposed to befall us this coming December.

I have renewed my ACM and various SIG memberships to run beyond December 2012. In the event of an actual disaster refunds will not be an issue. 😉

Comments Off

June 16, 2012

Third International Workshop on Consuming Linked Data (COLD2012)

Filed under: Conferences,Linked Data — Patrick Durusau @ 3:30 pm

Third International Workshop on Consuming Linked Data (COLD2012)

Important dates:

Paper submission deadline: July 31, 2012, 23.59 Hawaii time
Acceptance notification: August 21, 2012
Camera-ready versions of accepted papers: September 10, 2012
Workshop date: November, 2012

Abstract:

The quantity of published Linked Data is increasing dramatically. However, applications that consume Linked Data are not yet widespread. Current approaches lack methods for seamless integration of Linked Data from multiple sources, dynamic discovery of available data and data sources, provenance and information quality assessment, application development environments, and appropriate end user interfaces. Addressing these issues requires well-founded research, including the development and investigation of concepts that can be applied in systems which consume Linked Data from the Web. Following the success of the 1st International Workshop on Consuming Linked Data, we organize the second edition of this workshop in order to provide a platform for discussion and work on these open research problems. The main objective is to provide a venue for scientific discourse — including systematic analysis and rigorous evaluation — of concepts, algorithms and approaches for consuming Linked Data.

….

Objectives

The term Linked Data refers to a practice for publishing and interlinking structured data on the Web. Since the practice has been proposed in 2006, a grass-roots movement has started to publish and to interlink multiple open databases on the Web following the Linked Data principles. Due to conference workshops, tutorials, and general evangelism an increasing number of data publishers such as the BBC, Thomson Reuters, The New York Times, the Library of Congress, and the UK and US governments have adopted Linked Data principles. The ongoing effort resulted in bootstrapping the Web of Data which, today, comprises billions of RDF triples including millions of links between data sources. The published datasets include data about books, movies, music, radio and television programs, reviews, scientific publications, genes, proteins, medicine, and clinical trials, geographic locations, people, companies, statistical and census data, etc.

…

Several open issues that make the development of Linked Data based applications a challenging or still impossible task. These issues include the lack of approaches for seamless integration of Linked Data from multiple sources, for dynamic, on-the-fly discovery of available data, for information quality assessment, and for elaborate end user interfaces. These open issues can only be addressed appropriately when they are conceived as research problems that require the development and systematic investigation of novel approaches. The International Workshop on Consuming Linked Data (COLD) aims to provide a platform for the presentation and discussion of such approaches. Our main objective is to receive submissions that present scientific discussion (including systematic evaluation) of concepts and approaches, instead of exposition of features implemented in Linked Data based applications. For practical systems without formalization or evaluation we refer interested participants to other offerings at ISWC, such as the Semantic Web Challenge or the Demo Track. As such, we see our workshop as orthogonal to these events.

Probably prejudice on my part but I think topic maps would make a very viable approach for “…seamless integration of Linked Data from multiple sources…” Integration of dynamic resources is going to require a potentially semantically dynamic solution. One like topic maps.

Comments Off

June 1, 2012

Linked Data Patterns

Filed under: Linked Data — Patrick Durusau @ 8:57 am

Linked Data Patterns by Leigh Dodds and Ian Davis.

Leigh Dodds posted an email note this morning to announce a new revision:

There have been a number of revisions across the pattern catalogue, including addition of new introductory sections to each chapter. There are a total of 12 new patterns, many of which cover data management patterns relating to use of named graphs.

Without a local copy, I can’t specify what patterns have been added.

Obvious version information (other than date) on the cover “page” and access to prior versions would be a real plus.

Comments Off

May 15, 2012

Using “Punning” to Answer httpRange-14

Filed under: Linked Data,RDF,Semantic Web — Patrick Durusau @ 6:50 pm

Using “Punning” to Answer httpRange-14

Jeni Tennison writes in her introduction:

As part of the TAG’s work on httpRange-14, Jonathan Rees has assessed how a variety of use cases could be met by various proposals put before the TAG. The results of the assessment are a matrix which shows that “punning” is the most promising method, unique in not failing on either ease of use (use case J) or HTTP consistency (use case M).

In normal use, “punning” is about making jokes based around a word that has two meanings. In this context, “punning” is about using the same URI to mean two (or more) different things. It’s most commonly used as a term of art in OWL but normal people don’t need to worry particularly about that use. Here I’ll explore what that might actually mean as an approach to the httpRange-14 issue.

Jeni writes quite well and if you are really interested in the details of this self-inflicted wound, read her post in its entirety.

The post is summarized when she says:

Thus an implication of this approach is that the people who define languages and vocabularies must specify what aspect of a resource a URI used in a particular way identifies.

Her proposal makes disambiguation explicit. A strategy that is more likely to be successful than others.

Following that statement she treats how to usefully proceed from that position. (No guarantee her position will carry the day but it would be a good thing if it does.)

Comments (1)

Improving Schema Matching with Linked Data (Flushing the Knowledge Toilet)

Filed under: Linked Data,Schema — Patrick Durusau @ 3:40 pm

Improving Schema Matching with Linked Data by Ahmad Assaf, Eldad Louw, Aline Senart, Corentin Follenfant, Raphaël Troncy, and David Trastour.

Abstract:

With today’s public data sets containing billions of data items, more and more companies are looking to integrate external data with their traditional enterprise data to improve business intelligence analysis. These distributed data sources however exhibit heterogeneous data formats and terminologies and may contain noisy data. In this paper, we present a novel framework that enables business users to semi-automatically perform data integration on potentially noisy tabular data. This framework offers an extension to Google Refine with novel schema matching algorithms leveraging Freebase rich types. First experiments show that using Linked Data to map cell values with instances and column headers with types improves significantly the quality of the matching results and therefore should lead to more informed decisions.

Personally I don’t find mapping Airport -> Airport Code all that convincing a demonstration.

The other problem I have is what happens after a user “accepts” a mapping?

Now what?

I can contribute my expertise to mappings between diverse schemas all day, even public ones.

What happens to all that human effort?

It is what I call the “knowledge toilet” approach to information retrieval/integration.

Software runs (I can’t count the number of times integration software has been run on Citeseer. Can you?) and a user corrects the results as best they are able.

Now what?

Oh, yeah, the next user or group of users does it all over again.

Why?

Because the user before them flushed the knowledge toilet.

The information had been mapped. Possibly even hand corrected by one or more users. Then it is just tossed away.

That has to seem wrong at some very fundamental level. Whatever semantic technology you choose to use.

I’m open to suggestions.

How do we stop flushing the knowledge toilet?

Comments (4)

May 8, 2012

New Version of Code of Federal Regulations Launched by Cornell LII

Filed under: Law,Law - Sources,Linked Data — Patrick Durusau @ 2:40 pm

New Version of Code of Federal Regulations Launched by Cornell LII

From Legal Informatics, news of improved access to the Code of Federal Regulations.

US Government site: Code of Federal Regulations.

Cornell LII site: Code of Federal Regulations

You tell me, which one do you like better?

Note that the Government Printing Office (GPO, originator of the “official” version), Cornell LII and the Cornell Law Library have been collaborating for the last two years to make this possible.

The Legal Informatics post has a summary of the new features. You won’t gain anything from my repeating them.

Cornell LII plans on using Linked Data so you can link into the site.

Being able to link into this rich resource will definitely be a boon to other legal resource sites and topic maps. (Despite the limitations of linked data.)

The complete announcement can be found here.

PS: Donate to support the Cornell LII project.

Comments Off

April 13, 2012

Seminar: Five Years On

Filed under: Library,Linked Data,Semantic Web — Patrick Durusau @ 4:45 pm

Seminar: Five Years On

British Library
April 26, 2012 – April 27, 2012

From the webpage:

April 2012 marks the fifth anniversary of the Data Model Meeting at the British Library, London attended by participants interested in the fit between RDA: Resource Description and Access and the models used in other metadata communities, especially those working in the Semantic Web environment. This meeting, informally known as the “London Meeting”, has proved to be a critical point in the trajectory of libraries from the traditional data view to linked data and the Semantic Web.

DCMI-UK in cooperation with DCMI International as well as others will co-sponsor a one-day seminar on Friday 27 April 2012 to describe progress since 2007, mark the anniversary, and look to further collaboration in the future.

Speakers will include participants at the 2007 meeting and other significant players in library data and the Semantic Web. Papers from the seminar will be published by DCMI and available freely online.

The London Meeting stimulated significant development of Semantic Web representations of the major international bibliographic metadata models, including IFLA’s Functional Requirements family and the International Standard Bibliographic Description (ISBD), and MARC as well as RDA itself. Attention is now beginning to focus on the management and sustainability of this activity, and the development of high-level semantic and data structures to support library applications.

Would appreciate a note if you are in London for this meeting. Thanks!

Comments Off

April 8, 2012

Casellas et al. on Linked Legal Data: Improving Access to Regulatory Information

Filed under: Law - Sources,Legal Informatics,Linked Data — Patrick Durusau @ 4:21 pm

Casellas et al. on Linked Legal Data: Improving Access to Regulatory Information

From the post:

Dr. Núria Casellas of the Legal Information Institute at Cornell University Law School, and colleagues, have posted Linked Legal Data: Improving Access to Regulatory Information, a poster presented at Bits on Our Mind (BOOM) 2012, held 4 April 2012 at the Cornell University Department of Computing and Information Science, in Ithaca, New York, USA.

Here are excerpts from the poster:

The application of Linked Open Data (LOD) principles to legal information (URI naming of resources, assertions about named relationships between resources or between resources and data values, and the possibility to easily extend, update and modify these relationships and resources) could offer better access and understanding of legal knowledge to individual citizens, businesses and government agencies and administrations, and allow sharing and reuse of legal information across applications, organizations and jurisdictions. […]

With this project, we will enhance access to the Code of Federal Regulations (a text with 96.5 million words in total; ~823MB XML file size) with an RDF dataset created with a number of semantic-search and retrieval applications and information extraction techniques based on the development and the reuse of RDF product taxonomies, the application of semantic matching algorithms between these materials and the CFR content (Syntactic and Semantic Mapping), the detection of product-related terms and relations (Vocabulary Extraction), obligations and product definitions (Definition and Obligations Extraction). […]

You know, lawyers always speculated if the “Avoid Probate” (for non-U.S. readers, a publication to help citizens avoid the use of lawyers for inheritance issues) were in fact shadow publications of the bar association to promote the use of lawyers.

You haven’t seen a legal mess until someone tries “self-help” in a legal context. Probably doubles if not triples the legal fees involved.

Still, this may be an interesting source of data for services for lawyers and foolhardy citizens.

I shudder though at the “sharing of legal information across jurisdictions.” In most of the U.S., a creditor can claim say a car where a mortgage is past due. Without going to court. In Louisiana, at least a number of years ago, there was another name for self-help repossession. It was called felony theft. Like I said, self-help when it comes to the law isn’t a good idea.

Comments Off

Nature Publishing Group releases linked data platform

Filed under: Linked Data,LOD,Semantic Web — Patrick Durusau @ 4:21 pm

Nature Publishing Group releases linked data platform

From the post:

Nature Publishing Group (NPG) today is pleased to join the linked data community by opening up access to its publication data via a linked data platform. NPG’s Linked Data Platform is available at http://data.nature.com.

The platform includes more than 20 million Resource Description Framework (RDF) statements, including primary metadata for more than 450,000 articles published by NPG since 1869. In this first release, the datasets include basic citation information (title, author, publication date, etc) as well as NPG specific ontologies. These datasets are being released under an open metadata license, Creative Commons Zero (CC0), which permits maximal use/re-use of this data.

NPG’s platform allows for easy querying, exploration and extraction of data and relationships about articles, contributors, publications, and subjects. Users can run web-standard SPARQL Protocol and RDF Query Language (SPARQL) queries to obtain and manipulate data stored as RDF. The platform uses standard vocabularies such as Dublin Core, FOAF, PRISM, BIBO and OWL, and the data is integrated with existing public datasets including CrossRef and PubMed.

…

More information about NPG’s Linked Data Platform is available at http://developers.nature.com/docs. Sample queries can be found at http://data.nature.com/query.

You may find it odd that I would cite such a resource on the same day as penning Technology speedup graph where I speak so harshly about the Semantic Web.

On the contrary, disagreement about the success/failure of the Semantic Web and its retreat to Linked Data is an example of conflicting semantics. Conflicting semantics not being a “feature” of the Semantic Web.

Besides, Nature is a major science publisher and their experience with Linked Data is instructive.

Such as the NPG specific ontologies. 😉 Not what you were expecting?

This is a very useful resource and the Nature Publishing Group is to be commended for it.

The creation of metadata about the terms used within articles and the relationships between those terms as well as other publications, will make it more useful still.

Comments Off

April 4, 2012

Linked Data Basic Profile 1.0

Filed under: Linked Data,LOD,RDF,Semantic Web — Patrick Durusau @ 3:33 pm

Linked Data Basic Profile 1.0

A group of W3C members, IBM, DERI, EMC, Oracle, Red Hat, Tasktop and SemanticWeb.com have made a submission to the W3C with the title: Linked Data Basic Profile 1.0.

The submission consists of:

Linked Data Basic Profile 1.0

Linked Data Basic Profile Use Cases and Requirements

Linked Data Basic Profile RDF Schema

Interesting proposal.

Doesn’t try to do everything. The old 303/TBL is relegated to pagination. Probably a good use for it.

Comments?

Comments Off

« Newer Posts — Older Posts »

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 26, 2012

November 12, 2012

November 10, 2012

November 7, 2012

October 26, 2012

September 14, 2012

September 11, 2012

August 26, 2012

August 21, 2012

August 9, 2012

July 29, 2012

July 21, 2012

July 11, 2012

July 10, 2012

July 1, 2012

June 29, 2012

June 16, 2012

June 1, 2012

May 15, 2012

May 8, 2012

April 13, 2012

April 8, 2012

April 4, 2012