Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 9, 2012

Semantic Technologies — Biomedical Informatics — Individualized Medicine

Filed under: Bioinformatics,Biomedical,Medical Informatics,Ontology,Semantic Web — Patrick Durusau @ 11:14 am

Joint Workshop on Semantic Technologies Applied to Biomedical Informatics and Individualized Medicine (SATBI+SWIM 2012) (In conjunction with International Semantic Web Conference (ISWC 2012) Boston, Massachusetts, U.S.A. November 11-15, 2012)

If you are at ISWC, consider attending.

To help with that choice, the accepted papers:

Jim McCusker, Jeongmin Lee, Chavon Thomas and Deborah L. McGuinness. Public Health Surveillance Using Global Health Explorer. [PDF]

Anita de Waard and Jodi Schneider. Formalising Uncertainty: An Ontology of Reasoning, Certainty and Attribution (ORCA). [PDF]

Alexander Baranya, Luis Landaeta, Alexandra La Cruz and Maria-Esther Vidal. A Workflow for Improving Medical Visualization of Semantically Annotated CT-Images. [PDF]

Derek Corrigan, Jean Karl Soler and Brendan Delaney. Development of an Ontological Model of Evidence for TRANSFoRm Utilizing Transition Project Data. [PDF]

Amina Chniti, Abdelali BOUSSADI, Patrice DEGOULET, Patrick Albert and Jean Charlet. Pharmaceutical Validation of Medication Orders Using an OWL Ontology and Business Rules. [PDF]

Eleven SPARQL 1.1 Specifications Published

Filed under: RDF,Semantic Web,SPARQL — Patrick Durusau @ 7:00 am

Eleven SPARQL 1.1 Specifications Published

From the post:

The SPARQL Working Group has today published a set of eleven documents, advancing most of SPARQL 1.1 to Proposed Recommendation. Building on the success of SPARQL 1.0, SPARQL 1.1 is a full-featured standard system for working with RDF data, including a query/update language, two HTTP protocols (one full-featured, one using basic HTTP verbs), three result formats, and other features which allow SPARQL endpoints to be combined and work together. Most features of SPARQL 1.1 have already been implemented by a range of SPARQL suppliers, as shown in our table of implementations and test results.

The Proposed Recommendations are:

  1. SPARQL 1.1 Overview – Overview of SPARQL 1.1 and the SPARQL 1.1 documents
  2. SPARQL 1.1 Query Language – A query language for RDF data.
  3. SPARQL 1.1 Update – Specifies additions to the query language to allow clients to update stored data
  4. SPARQL 1.1 Query Results JSON Format – How to use JSON for SPARQL query results
  5. SPARQL 1.1 Query Results CSV and TSV Formats – How to use comma-separated values (CVS) and tab-separated values (TSV) for SPARQL query results
  6. SPARQL Query Results XML Format – How to use XML for SPARQL query results. (This contains only minor, editorial updates from SPARQL 1.0, and is actually a Proposed Edited Recommendation.)
  7. SPARQL 1.1 Federated Query – an extension of the SPARQL 1.1 Query Language for executing queries distributed over different SPARQL endpoints.
  8. SPARQL 1.1 Service Description – a method for discovering and a vocabulary for describing SPARQL services.

While you are waiting for news on SPARQL performance increases, some reading material to pass the time.

September 14, 2012

ESWC 2013 : 10th Extended Semantic Web Conference

Filed under: BigData,Linked Data,Semantic Web,Semantics — Patrick Durusau @ 1:24 pm

ESWC 2013 : 10th Extended Semantic Web Conference

Important Dates:

Abstract submission: December 5th, 2012

Full paper submission: December 12th, 2012

Authors’ rebuttals: February 11th-12th, 2013

Acceptance Notification: February 22nd, 2013

Camera ready: March 9th, 2013

Conference: May 26th-30th, 2013

From the call for papers:

ESWC is the premier European-based annual conference for researchers and practitioners in the field of semantic technologies. ESWC is the ideal venue for the discussion of the latest scientific insights and novel applications of semantic technologies.

The leading motto of the 10th edition of ESWC will be “Semantics and Big Data”. A crucial challenge that will guide the efforts of many scientific communities in the years to come is the one of making sense of large volumes of heterogeneous and complex data. Application-relevant data often has to be processed in real time and originates from diverse sources such as Linked Data, text and speech, images, videos and sensors, communities and social networks, etc. ESWC, with its focus on semantics, can offer an important contribution to global challenge.

ESWC 2013 will feature nine thematic research tracks (see below) as well as an in-use and industrial track. In line with the motto “Semantics and Big Data”, the conference will feature a special track on “Semantic Technologies for Big Data Analytics in Real Time”. In order to foster the interaction with other disciplines, this year’s edition will also feature a special track on “Cognition and Semantic Web”.

For the research and special tracks, we welcome the submission of papers describing theoretical, analytical, methodological, empirical, and application research on semantic technologies. For the In-Use and Industrial track we solicit the submission of papers describing the practical exploitation of semantic technologies in different domains and sectors. Submitted papers should describe original work, present significant results, and provide rigorous, principled, and repeatable evaluation. We strongly encourage and appreciate the submission of papers including links to data sets and other material used for the evaluation as well as to live demos or source code for tool implementations.

Submitted papers will be judged based on originality, awareness of related work, potential impact on the Semantic Web field, technical soundness of the proposed methods, and readability. Each paper will be reviewed by at least three program committee members in addition to one track chair. This year a rebuttal phase has been introduced in order to give authors the opportunity to provide feedback to reviewers’ questions. The authors’ answers will support reviewers and track chairs in their discussion and in taking final decisions regarding acceptance.

I would call your attention to:

A crucial challenge that will guide the efforts of many scientific communities in the years to come is the one of making sense of large volumes of heterogeneous and complex data.

Sounds like they are playing the topic map song!

Ping me if you are able to attend and would like to collaborate on a paper.

August 26, 2012

Semantic University

Filed under: Semantic Web,Semantics — Patrick Durusau @ 2:18 pm

Semantic University

From the homepage:

Semantic University will be the single largest and most accessible source of educational material relating to semantic technologies. Moreover, it will fill several important gaps in current material by providing:

  • Lessons suitable to those brand new to the space.
  • Comparisons, both high-level and in-depth, with related technologies, such as NoSQL and Big Data.
  • Interactive, hands on tutorials.

Have you used these materials? Comparison to others?

July 29, 2012

Open Services for Lifecycle Collaboration (OSLC)

Filed under: Linked Data,Semantic Web,Standards — Patrick Durusau @ 9:55 am

Open Services for Lifecycle Collaboration (OSLC)

This is one of the efforts mentioned in: Linked Data: Esperanto for APIs?.

From the about page:

Open Services for Lifecycle Collaboration (OSLC) is a community of software developers and organizations that is working to standardize the way that software lifecycle tools can share data (for example, requirements, defects, test cases, plans, or code) with one another.

We want to make integrating lifecycle tools a practical reality. (emphasis in original)

That’s a far cry from:

At the very least, however, a generally accepted approach to linking data within applications that make the whole programmable Web concept more accessible to developers of almost every skill level should not be all that far off from here.

It has an ambitious but well-defined scope, which will lend itself to the development and testing of standards for the interchange of information.

Despite semantic diversity, those are tasks that can be identified and that would benefit from standardization.

There is measurable ROI for participants who use the standard in a software lifecycle. They are giving up semantic diversity in exchange for other tangible benefits.

An effort to watch as a possible basis for integrating older software lifecycle tools.

Linked Data: Esperanto for APIs?

Filed under: Linked Data,RDF,Semantic Web — Patrick Durusau @ 9:40 am

Michael Vizard writes in: Linked Data to Take Programmable Web to a Higher Level:

The whole concept of a programmable Web may just be too important to rely solely on APIs. That’s the thinking behind a Linked Data Working Group initiative led by the W3C that expects to create a standard for embedding URLs directly within application code to more naturally integrate applications. Backed by vendors such as IBM and EMC, the core idea is to create more reliable method for integrating applications that more easily scales by not creating unnecessary dependencies of APIs and middleware.

At the moment most of the hopes for a truly programmable Web are tied to an API model that is inherently flawed. That doesn’t necessarily mean that Linked Data approaches will eliminate the need for APIs. But in terms of making the Web a programmable resource, Linked Data represents a significant advance in terms of both simplifying the process of actually integrating data while simultaneously reducing dependencies on cumbersome middleware technologies that are expensive to deploy and manage.

Conceptually, linked data is obvious idea. But getting everybody to agree on an actual standard is another matter. At the very least, however, a generally accepted approach to linking data within applications that make the whole programmable Web concept more accessible to developers of almost every skill level should not be all that far off from here. (emphasis added)

I am often critical of Linked Data efforts so let’s be clear:

Linked Data, as a semantic identification method, has strengths and weaknesses, just like any other semantic identification method. If it works for your particular application, great!

One of my objections to Linked Data is its near religious promotion as a remedy for semantic diversity. I don’t think a remedy for semantic diversity is possible, nor is is desirable.

The semantic diversity in IT is like the genetic diversity in the plant and animal kingdoms. It is responsible for robustness and innovation.

Not the fault of Linked Data but it is often paired with explanations for the failure of the Semantic Web to thrive.

The first Scientific American “puff piece” on the semantic was more than a decade ago now. We suddenly learn that it hasn’t been a failure of user interest, adoption, etc., that have defeated the Semantic Web, but a flawed web API model. Cure that and semantic nirvana is just around the corner.

The Semantic Web has failed to thrive because the forces of semantic diversity are more powerful than any effort at semantic sameness.

The history of natural languages and near daily appearance of new programming languages, to say nothing of the changing semantics of both, are evidence for “forces of semantic diversity.”

To paraphrase Johnny Cash, “do we kick against the pricks (semantic diversity)” or build systems that take it into account?

July 12, 2012

Semantator: annotating clinical narratives with semantic web ontologies

Filed under: Annotation,Ontology,Protégé,RDF,Semantator,Semantic Web — Patrick Durusau @ 2:40 pm

Semantator: annotating clinical narratives with semantic web ontologies by Dezhao Song, Christopher G. Chute, and Cui Tao. (AMIA Summits Transl Sci Proc. 2012;2012:20-9. Epub 2012 Mar 19.)

Abstract:

To facilitate clinical research, clinical data needs to be stored in a machine processable and understandable way. Manual annotating clinical data is time consuming. Automatic approaches (e.g., Natural Language Processing systems) have been adopted to convert such data into structured formats; however, the quality of such automatically extracted data may not always be satisfying. In this paper, we propose Semantator, a semi-automatic tool for document annotation with Semantic Web ontologies. With a loaded free text document and an ontology, Semantator supports the creation/deletion of ontology instances for any document fragment, linking/disconnecting instances with the properties in the ontology, and also enables automatic annotation by connecting to the NCBO annotator and cTAKES. By representing annotations in Semantic Web standards, Semantator supports reasoning based upon the underlying semantics of the owl:disjointWith and owl:equivalentClass predicates. We present discussions based on user experiences of using Semantator.

If you are an AMIA member, see above for the paper. If not, see: Semantator: annotating clinical narratives with semantic web ontologies (PDF file). And the software/webpage: Semantator.

Software is a plugin for Protege 4.1 or higher.

Looking at the extensive screen shots at the website, which has good documentation, the first question I would ask a potential user is: “Are you comfortable with Protege?” If they aren’t I suspect you are going to invest a lot of time in teaching them ontologies and Protege. Just an FYI.

Complex authoring tools, particularly for newbies, seem like a non-starter to me. For example, why not have a standalone entity extractor (but don’t call it that, call it “I See You (ISY)) that uses a preloaded entity file to recognize entities in a text. Where there is uncertainty, those are displayed in a different color, with drop down options on possible other entities. User get to pick one from the list (no write in ballots). Performs a step towards getting clean data for a second round with another one-trick-pony tool. User contributes, we all benefit.

Which brings me to the common shortfall of annotation solutions: the requirement that the text to be annotated be in plain text.

There are lot of “text” documents but what of those in Word, PDF, Postscript, PPT, Excel, to say nothing of other formats?

The past will not disappear for want of a robust annotation solution.

Nor should it.

July 10, 2012

Linked Media Framework [Semantic Web vs. ROI]

Filed under: Linked Data,RDF,Semantic Web,SKOS,SPARQL — Patrick Durusau @ 11:08 am

Linked Media Framework

From the webpage:

The Linked Media Framework is an easy-to-setup server application that bundles central Semantic Web technologies to offer advanced services. The Linked Media Framework consists of LMF Core and LMF Modules.

LMF Usage Scenarios

The LMF has been designed with a number of typical use cases in mind. We currently support the following tasks out of the box:

Target groups are a in particular casual users who are not experts in Semantic Web technologies but still want to publish or work with Linked Data, e.g. in the Open Government Data and Linked Enterprise Data area.

It is a bad assumption that workers in business or government have free time to add semantics to their data sets.

If adding semantics to your data, by linked data or other means is a core value, resource the task just like any other with your internal staff or hire outside help.

A Semantic Web short coming is the attitude that users are interested in or have the time to build it. Assuming the project to be worthwhile and/or doable.

Users are fully occupied with tasks of their own and don’t need a technical elite tossing more tasks onto them. You want the Semantic Web? Suggest you get on that right away.

Integrated data that meets a business need and has proven ROI isn’t the same thing as the Semantic Web. Give me a call if you are interested in the former, not the latter. (I would do the latter as well, but only on your dime.)

I first saw this at semanticweb.com, announcing version 2.2.0 of lmf – Linked Media Framework.

July 6, 2012

SparQLed…Writing SPARQL Queries [Less ZERO-result queries]

Filed under: RDF,Semantic Web,SPARQL — Patrick Durusau @ 4:36 pm

SindiceTech Releases SparQLed As Open Source Project To Simplify Writing SPARQL Queries by Jennifer Zaino.

From the post:

SindiceTech today released SparQLed, the SindiceTech Assisted SPARQL Editor, as an open source project. SindiceTech, a spinoff company from the DERI Institute, commercializes large-scale, Big Data infrastructures for enterprises dealing with semantic data. It has roots in the semantic web index Sindice, which lets users collect, search, and query semantically marked-up web data (see our story here).

SparQLed also is one of the components of the commercial Sindice Suite for helping large enterprises build private linked data clouds. It is designed to give users all the help they need to write SPARQL queries to extract information from interconnected datasets.

“SPARQL is exciting but it’s difficult to develop and work with,” says Giovanni Tummarello, who led the efforts around the Sindice search and analysis engine and is founder and CEO of SindiceTech.

SparQLed Project page.

Maybe we have become spoiled by search engines that always return results, even bad ones:

With SQL, the advantage lies in having a schema which users can look at and understand how to write a query. RDF, on the other hand, has the advantage of providing great power and freedom, because information in RDF can be interconnected freely. But, Tummarello says, “with RDF there is no schema because there is all sorts of information from everywhere.” Without knowing which properties are available specifically for a certain URI and in what context, users can wind up writing queries that return no results and get frustrated by the constant iterating needed to achieve their ends.

I am not encouraged by a features list that promises:

Less ZERO-result queries

July 1, 2012

The observational roots of reference of the semantic web

Filed under: Identity,Semantic Web,Semantics — Patrick Durusau @ 4:42 pm

The observational roots of reference of the semantic web by Simon Scheider, Krzysztof Janowicz, and Benjamin Adams.

Abstract:

Shared reference is an essential aspect of meaning. It is also indispensable for the semantic web, since it enables to weave the global graph, i.e., it allows different users to contribute to an identical referent. For example, an essential kind of referent is a geographic place, to which users may contribute observations. We argue for a human-centric, operational approach towards reference, based on respective human competences. These competences encompass perceptual, cognitive as well as technical ones, and together they allow humans to inter-subjectively refer to a phenomenon in their environment. The technology stack of the semantic web should be extended by such operations. This would allow establishing new kinds of observation-based reference systems that help constrain and integrate the semantic web bottom-up.

In arguing for recasting the problem of semantics as one of reference, the authors say:

Reference systems. Solutions to the problem of reference should transgress syntax as well as technology. They cannot solely rely on computers but must also rely on human referential competences. This requirement is met by reference systems [22]. Reference systems are different from ontologies in that they constrain meaning bottom-up [11]. Most importantly, they are not “yet another chimera” invented by ontology engineers, but already exist in various successful variants.

I rather like the “human referential competences….”

After all, useful semantic systems are about references that we recognize.

June 24, 2012

Stardog 1.0

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 8:19 pm

Stardog 1.0 by Kendall Clark.

From the post:

Today I’m happy to announce the release of Stardog 1.0, the fastest, smartest, and easiest to use RDF database on the planet. Stardog fills a hole in the Semantic Technology (and NoSQL database) market for an RDF database that is fast, zero config, lightweight, and feature-rich.

Speed Kills

RDF and OWL are excellent technologies for building data integration and analysis apps. Those apps invariably require complex query processing, i.e., queries where there are lots of joins, complex logical conditions to evaluate, etc. Stardog is targeted at query performance for complex SPARQL queries. We publish performance data so you can see how we’re doing.

Braindead Simple Deployment

Winners ship. Period.

We care very much about simple deployments. Stardog works out-of-the-box with minimal (none, typically) configuration. You shouldn’t have to fight an RDF database for days to install or tune it for great performance. Because Stardog is pure Java, it will run anywhere. It just works and it’s damn fast. You shouldn’t need to buy and configure a cluster of machines to get blazing fast performance from an RDF database. And now you don’t have to.

One More Thing…OWL Reasoning

Finally, Stardog has the deepest, most comprehensive, and best OWL reasoning support of any commerical RDF database available.

Stardog 1.0 supports RDFS, OWL 2 QL, EL, and RL, as well as OWL 2 DL schema-reasoning. It’s also the only RDF database to support closed-world integrity constraint validation and automatic explanations of integrity constraint violations.

If you care about data quality, Stardog 1.0 is worth a hard look.

OK, so I have signed up for an evaluation version, key, etc. Email just arrived.

Downloaded software and license key.

With all the open data laying around, should not be hard to find test data.

More to follow. Comments welcome.

June 13, 2012

SeRSy 2012

Filed under: Conferences,Recommendation,Semantic Web — Patrick Durusau @ 2:21 pm

SeRSy 2012: International Workshop on Semantic Technologies meet Recommender Systems & Big Data

Important Dates:

Submission of papers: July 31, 2012
Notification of acceptance: August 21, 2012
Camera-ready versions: September 10, 2012

[In connection with the 11th International Semantic Web Conference, Boston, USA, November 11-15, 2012.]

The scope statement:

People generally need more and more advanced tools that go beyond those implementing the canonical search paradigm for seeking relevant information. A new search paradigm is emerging, where the user perspective is completely reversed: from finding to being found. Recommender Systems may help to support this new perspective, because they have the effect of pushing relevant objects, selected from a large space of possible options, to potentially interested users. To achieve this result, recommendation techniques generally rely on data referring to three kinds of objects: users, items and their relations.

Recent developments of the Semantic Web community offer novel strategies to represent data about users, items and their relations that might improve the current state of the art of recommender systems, in order to move towards a new generation of recommender systems which fully understand the items they deal with.

More and more semantic data are published following the Linked Data principles, that enable to set up links between objects in different data sources, by connecting information in a single global data space: the Web of Data. Today, Web of Data includes different types of knowledge represented in a homogeneous form: sedimentary one (encyclopedic, cultural, linguistic, common-sense) and real-time one (news, data streams, …). This data might be useful to interlink diverse information about users, items, and their relations and implement reasoning mechanisms that can support and improve the recommendation process.

The challenge is to investigate whether and how this large amount of wide-coverage and linked semantic knowledge can be automatically introduced into systems that perform tasks requiring human-level intelligence. Examples of such tasks include understanding a health problem in order to make a medical decision, or simply deciding which laptop to buy. Recommender systems support users exactly in those complex tasks.

The primary goal of the workshop is to showcase cutting edge research on the intersection of Semantic Technologies and Recommender Systems, by taking the best of the two worlds. This combination may provide the Semantic Web community with important real-world scenarios where its potential can be effectively exploited into systems performing complex tasks.

Should be interesting to see whether the semantic technologies or the recommender systems or both get the “rough” or inexact edges.

June 11, 2012

Scale, Structure, and Semantics

Filed under: Communication,Semantic Web,Semantics — Patrick Durusau @ 4:20 pm

Scale, Structure, and Semantics by Daniel Turkelang.

From the post:

This morning I had the pleasure to present a keynote address at the Semantic Technology & Business Conference (SemTechBiz). I’ve had a long and warm relationship with the semantic technology community — especially with Marco Neumann and the New York Semantic Web Meetup.

To give you a taste of the slides:

1. Knowledge representation is overrated.

2. Computation is underrated.

3. We have a communication problem.

I find it helpful to think of search/retrieval as asynchronous conversation.

If I can’t continue or find my place in or know what a conversation is about, there is a communication problem.

June 2, 2012

dipLODocus[RDF]

Filed under: RDF,Semantic Web — Patrick Durusau @ 6:17 pm

dipLODocus[RDF]

From the webpage:

dipLODocus[RDF] is a new system for RDF data processing supporting both simple transactional queries and complex analytics efficiently. dipLODocus[RDF] is based on a novel hybrid storage model considering RDF data both from a graph perspective (by storing RDF subgraphs or RDF molecules) and from a “vertical” analytics perspective (by storing compact lists of literal values for a given attribute).

Overview

Our system is built on three main structures: RDF molecule clusters (which can be seen as hybrid structures borrowing both from property tables and RDF subgraphs), template lists (storing literals in compact lists as in a column-oriented database system) and an efficient hash-table indexing URIs and literals based on the clusters they belong to.

Figure below gives a simple example of a few molecule clusters—storing information about students—and of a template list—compactly storing lists of student IDs. Molecules can be seen as horizontal structures storing information about a given object instance in the database (like rows in relational systems). Template lists, on the other hand, store vertical lists of values corresponding to one type of object (like columns in a relational system).

Interesting performance numbers:

  • 30x RDF-3X on LUBM queries
  • 350x Virtuoso on analytic queries

Combines data structures as opposed to adopting one single approach.

Perhaps data structures will be explored and optimized for data, rather than the other way around?

dipLODocus[RDF] | Short and Long-Tail RDF Analytics for Massive Webs of Data by Marcin Wylot, Jigé Pont, Mariusz Wisniewski, and Philippe Cudré-Mauroux (paper – PDF).

I first saw this at the SemanticWeb.com.

June 1, 2012

Are You Going to Balisage?

Filed under: Conferences,RDF,RDFa,Semantic Web,XML,XML Database,XML Schema,XPath,XQuery,XSLT — Patrick Durusau @ 2:48 pm

To the tune of “Are You Going to Scarborough Fair:”

Are you going to Balisage?
Parsley, sage, rosemary and thyme.
Remember me to one who is there,
she once was a true love of mine.

Tell her to make me an XML shirt,
Parsley, sage, rosemary, and thyme;
Without any seam or binary code,
Then she shall be a true lover of mine.

….

Oh, sorry! There you will see:

  • higher-order functions in XSLT
  • Schematron to enforce consistency constraints
  • relation of the XML stack (the XDM data model) to JSON
  • integrating JSON support into XDM-based technologies like XPath, XQuery, and XSLT
  • XML and non-XML syntaxes for programming languages and documents
  • type introspection in XQuery
  • using XML to control processing in a document management system
  • standardizing use of XQuery to support RESTful web interfaces
  • RDF to record relations among TEI documents
  • high-performance knowledge management system using an XML database
  • a corpus of overlap samples
  • an XSLT pipeline to translate non-XML markup for overlap into XML
  • comparative entropy of various representations of XML
  • interoperability of XML in web browsers
  • XSLT extension functions to validate OCL constraints in UML models
  • ontological analysis of documents
  • statistical methods for exploring large collections of XML data

Balisage is an annual conference devoted to the theory and practice of descriptive markup and related technologies for structuring and managing information. Participants typically include XML users, librarians, archivists, computer scientists, XSLT and XQuery programmers, implementers of XSLT and XQuery engines and other markup-related software, Topic-Map enthusiasts, semantic-Web evangelists, members of the working groups which define the specifications, academics, industrial researchers, representatives of governmental bodies and NGOs, industrial developers, practitioners, consultants, and the world’s greatest concentration of markup theorists. Discussion is open, candid, and unashamedly technical.

The Balisage 2012 Program is now available at: http://www.balisage.net/2012/Program.html

May 19, 2012

Searching For An Honest Engineer

Filed under: Google Knowledge Graph,RDF,Semantic Web — Patrick Durusau @ 7:28 pm

Sean Golliher needs to take his lantern, to search for an honest engineer at the W3C.

Sean writes in Google Just Hi-jacked the Semantic Web Vocabulary:

Google announced they’re rolling out new enhancements to their search technology and they’re calling it the “Knowledge Graph.” For those involved in the Semantic Web Google’s “Knowledge Graph” is nothing new. After watching the video, and reading through the announcements, the Google engineers are giving the impression, to those familiar with this field, that they have created something new and innovative.

While it ‘s commendable that Google is improving search it’s interesting to note the direct translations of Google’s “new language” to the existing semantic web vocabulary. Normally engineers and researchers quote, or at least reference, the original sources of their ideas. One can’t help but notice that the semantic web isn’t mentioned in any of Google’s announcements. After watching the different reactions from the semantic web community I found that many took notice of the language Google used and how the ideas from the semantic web were repackaged as “new” and discovered by Google.

Did you know that the W3C invented the ideas for:

  • Knowledge Graph
  • Relationships Between things
  • Naming things Better (Taxonomy?)
  • Objects/Entities
  • Ambiguous Language (Semantics?)
  • Connecting Things
  • discover new, and relevant, things you like (Serendipity?)
  • meaning (Semantic?)
  • graph (RDF?)
  • things (URIs (Linked Data)?)
  • real-world entities and their relationships to one another: things (Linked Data?)

?

Really? Semantic, serendipity, graph, relationships between real-world entities?

All invented by the W3C and/or carefully crediting prior work.

Right.

Good luck with your search Sean.

May 17, 2012

“…Things, Not Strings”

Filed under: Google Knowledge Graph,Marketing,RDF,RDFa,Semantic Web,Topic Maps — Patrick Durusau @ 6:30 pm

The brilliance at Google spreads beyond technical chops and into their marketing department.

Effective marketing can be what you do but what you don’t do as well.

What did Google not do with the Google Knowledge Graph?

Google Knowledge Graph does not require users to:

  • learn RDF/RDFa
  • learn OWL
  • learn various syntaxes
  • build/choose ontologies
  • use SW software
  • wait for authoritative instructions from Mount W3C

What does Google Knowledge Graph do?

It gives users information about things, things that are of interest to users. Using their web browsers.

Let’s see, we can require users to do what we want, or, we can give users what they want.

Which one do you think is the most likely to succeed? (No peeking!)

May 15, 2012

Using “Punning” to Answer httpRange-14

Filed under: Linked Data,RDF,Semantic Web — Patrick Durusau @ 6:50 pm

Using “Punning” to Answer httpRange-14

Jeni Tennison writes in her introduction:

As part of the TAG’s work on httpRange-14, Jonathan Rees has assessed how a variety of use cases could be met by various proposals put before the TAG. The results of the assessment are a matrix which shows that “punning” is the most promising method, unique in not failing on either ease of use (use case J) or HTTP consistency (use case M).

In normal use, “punning” is about making jokes based around a word that has two meanings. In this context, “punning” is about using the same URI to mean two (or more) different things. It’s most commonly used as a term of art in OWL but normal people don’t need to worry particularly about that use. Here I’ll explore what that might actually mean as an approach to the httpRange-14 issue.

Jeni writes quite well and if you are really interested in the details of this self-inflicted wound, read her post in its entirety.

The post is summarized when she says:

Thus an implication of this approach is that the people who define languages and vocabularies must specify what aspect of a resource a URI used in a particular way identifies.

Her proposal makes disambiguation explicit. A strategy that is more likely to be successful than others.

Following that statement she treats how to usefully proceed from that position. (No guarantee her position will carry the day but it would be a good thing if it does.)

May 14, 2012

Web Developers Can Now Easily “Play” with RDFa

Filed under: RDF,RDFa,Semantic Web — Patrick Durusau @ 9:16 am

Web Developers Can Now Easily “Play” with RDFa by Eric Franzon.

From the post:

Yesterday, we announced RDFa.info, a new site devoted to helping developers add RDFa (Resource Description Framework-in-attributes) to HTML.

Building on that work, the team behind RDFa.info is announcing today the release of “PLAY,” a live RDFa editor and visualization tool. This release marks a significant step in providing tools for web developers that are easy to use, even for those unaccustomed to working with RDFa.

“Play” is an effort that serves several purposes. It is an authoring environment and markup debugger for RDFa that also serves as a teaching and education tool for Web Developers. As Alex Milowski, one of the core RDFa.info team, said, “It can be used for purposes of experimentation, documentation (e.g. crafting an example that produces certain triples), and testing. If you want to know what markup will produce what kind of properties (triples), this tool is going to be great for understanding how you should be structuring your own data.”

A useful site for learning RDFa that is open for contributions, such as examples and documentation.

April 13, 2012

Seminar: Five Years On

Filed under: Library,Linked Data,Semantic Web — Patrick Durusau @ 4:45 pm

Seminar: Five Years On

British Library
April 26, 2012 – April 27, 2012

From the webpage:

April 2012 marks the fifth anniversary of the Data Model Meeting at the British Library, London attended by participants interested in the fit between RDA: Resource Description and Access and the models used in other metadata communities, especially those working in the Semantic Web environment. This meeting, informally known as the “London Meeting”, has proved to be a critical point in the trajectory of libraries from the traditional data view to linked data and the Semantic Web.

DCMI-UK in cooperation with DCMI International as well as others will co-sponsor a one-day seminar on Friday 27 April 2012 to describe progress since 2007, mark the anniversary, and look to further collaboration in the future.

Speakers will include participants at the 2007 meeting and other significant players in library data and the Semantic Web. Papers from the seminar will be published by DCMI and available freely online.

The London Meeting stimulated significant development of Semantic Web representations of the major international bibliographic metadata models, including IFLA’s Functional Requirements family and the International Standard Bibliographic Description (ISBD), and MARC as well as RDA itself. Attention is now beginning to focus on the management and sustainability of this activity, and the development of high-level semantic and data structures to support library applications.

Would appreciate a note if you are in London for this meeting. Thanks!

April 12, 2012

Is There A Dictionary In The House? (Savanna – Think Software)

Filed under: Integration,Intelligence,OWL,Semantic Web — Patrick Durusau @ 7:04 pm

Reading a white paper on an integration solution from Thetus Corporation (on its Savanna product line) when I encountered:

Savanna supports the core architectural premise that the integration of external services and components is an essential element of any enterprise platform by providing out-of-the-box integrations with many of the technologies and programs already in use in the DI2E framework. These investments include existing programs, such as: the Intelligence Community Data Layer (ICDL), OPTIC (force protection application), WATCHDOG (Terrorist Watchlist 2.0), SERENGETI (AFRICOM socio-cultural analysis), SCAN-R (EUCOM deep futures analysis); and, in the future: TAC (tripwire search and analysis), and HSCB-funded modeling capabilities, including Signature Analyst and others. To further make use of existing external services and components, the proposed solution includes integration points for commercial and opensource software, including: SOLR (indexing), Open Sextant (geotagging), Apache OpenNLP (entity extraction), R (statistical analysis), ESRI (geo-processing), OpenSGI GeoCache (geospatial data), i2 Analyst’s Notebook (charting and analysis) and a variety of structured and unstructured data repositories.

I have to plead ignorance of the “existing program” alphabet soup but I am familiar with several of the open source packages.

I am not sure what an “integration point” for an unknown future use of any of those packages would look like. Do you? Their output can be used by any program but that hardly qualifies the other program as having an “integration point.”

I am sensitive to the use of “integration” because to me it means there is some basis for integration. So a user having integrated data once, can re-use and possibly enhance the basis for integration of data with other data. (We call that “merging” in topic map land.)

Integration and even reuse is mentioned: “The Savanna architecture prevents creating a set of comparable reuse issues at the enterprise scale by providing a set of interconnected and flexible models that articulate how analysis assets are sourced and created and how they are used by the community.” (page 16)

But not in enough detail to really evaluate the basis for re-use of data, data structures, enrichment of the same, etc.

Looked around for an SDK or such but came up empty.

Point of amusement:

It’s official, we’re debuting our newest release of Savanna at DoDIIS (March 21, 2012) (Department of Defense Intelligence Information Systems Worldwide Conference (DoDIIS))

The next blog entry by date?

Happy Peaceful Birthday to the Peace Corps (March 1, 2012)

I would appreciate hearing from anyone with information or stories to tell about how Savanna works in practice.

In particular I am interested in whether two distinct Savanna installations can share information in a blind interchange? That should be the test of re-use of information by another installation.

Moreover, do I have to convert data between formats or can data structures themselves be entities with properties?

PS: I am not overly impressed with the use of OWL for modeling in Savanna. The experience with “big data” has shown that starting with data first leads to different, perhaps more useful models than the other way around.

Premature modeling with OWL will result in models that are “useful” in meeting the expectations of the creating analyst. That may not be the criteria of “usefulness” that is required.

April 8, 2012

Nature Publishing Group releases linked data platform

Filed under: Linked Data,LOD,Semantic Web — Patrick Durusau @ 4:21 pm

Nature Publishing Group releases linked data platform

From the post:

Nature Publishing Group (NPG) today is pleased to join the linked data community by opening up access to its publication data via a linked data platform. NPG’s Linked Data Platform is available at http://data.nature.com.

The platform includes more than 20 million Resource Description Framework (RDF) statements, including primary metadata for more than 450,000 articles published by NPG since 1869. In this first release, the datasets include basic citation information (title, author, publication date, etc) as well as NPG specific ontologies. These datasets are being released under an open metadata license, Creative Commons Zero (CC0), which permits maximal use/re-use of this data.

NPG’s platform allows for easy querying, exploration and extraction of data and relationships about articles, contributors, publications, and subjects. Users can run web-standard SPARQL Protocol and RDF Query Language (SPARQL) queries to obtain and manipulate data stored as RDF. The platform uses standard vocabularies such as Dublin Core, FOAF, PRISM, BIBO and OWL, and the data is integrated with existing public datasets including CrossRef and PubMed.

More information about NPG’s Linked Data Platform is available at http://developers.nature.com/docs. Sample queries can be found at http://data.nature.com/query.

You may find it odd that I would cite such a resource on the same day as penning Technology speedup graph where I speak so harshly about the Semantic Web.

On the contrary, disagreement about the success/failure of the Semantic Web and its retreat to Linked Data is an example of conflicting semantics. Conflicting semantics not being a “feature” of the Semantic Web.

Besides, Nature is a major science publisher and their experience with Linked Data is instructive.

Such as the NPG specific ontologies. 😉 Not what you were expecting?

This is a very useful resource and the Nature Publishing Group is to be commended for it.

The creation of metadata about the terms used within articles and the relationships between those terms as well as other publications, will make it more useful still.

Technology speedup graph

Filed under: Semantic Diversity,Semantic Web — Patrick Durusau @ 4:21 pm

Technology speedup graph

Andrew Gelman posts an interesting graphic showing the adoption of various technologies from 1900 forward. See the post for the lineage on the graph and the details. Good graphic.

What caught my eye for topic maps was the rapid adoption of the Internet/WWW and the now well recognized failure of the Semantic Web.

You may feel like disputing my evaluation of the Semantic Web. Recall that agents were predicted to be roaming the Semantic Web by this point in Tim Berners-Lee’s first puff piece in Scientific American. After a few heady years of announcements of realization is just around the corner, the 21st century technology equivalent of the long retreat (think Napoleon).

Now the last gasp is Linked Data, the “meaning” of URIs is be determined on mount W3C and then imposed on the rest of us.

Make no mistake, I think the WWW was a truly great technological achievement.

But the technological progress graph prompted me to wonder, yet again, how is the WWW different from the Semantic Web?

Not sure this is helpful but consider the level of agreement on semantics required by the WWW versus the Semantic Web.

For the WWW, there are a handful of RFCs that specify the treatment of syntax. That is addresses and the composition of resources that you find at those addresses. Users may attach semantics to those resources, but none of those semantics are required for processing or delivery of the resources.

That is for the WWW to succeed, all we need is agreement on the addressing and processing of resources and not at all on their semantics.

A resource can have a crazy quilt of semantics attached to it by users, diverse, inconsistent, contradictory, because its addressing and processing is independent of those semantics and those who would impose them.

Resources on the WWW certainly have semantics, but processing those resources doesn’t depend on our agreement on those semantics.

So, the semantic agreement of the WWW = ~ 0. (Leaving aside the certainly true contention that protocols have semantics.)

The semantic agreement required by the Semantic Web is “web scale agreement.” That is everyone who encounters a semantic has to either honor it or break that part of the Semantic Web.

Wait until after you watch the BBC News or Al Jazeera (English), الجزيرة.نت, before you suggest universal semantics are just around the corner.

April 6, 2012

“Give me your tired, your poor, your huddled identifiers yearning to be used.”

Filed under: Identifiers,RDF,Semantic Web — Patrick Durusau @ 6:52 pm

I was reminded of the title quote when I read Richard Wallis’s: A Fundamental Linked Data Debate.

Contrary to Richard’s imaginings, the vast majority of people on and off the Web are not waiting for the debates on the W3C’s Technical Architecture (TAG) or Linked Open Data (public-lod) mailing lists to be resolved.

Why?

They had identifiers for subjects long before the WWW, Semantic Web, Linked Data or whatever and will have identifiers for subjects long after those efforts and their successors are long forgotten.

Some of those identifiers are still in use today and will survive well into the future. Others are historical curiosities.

Moreover, when it was necessary to distinguish between identifiers and the things identified, that need was met.

Entire the WWW and its poster child, Tim Berners-Lee.

It was Tim Berners-Lee who created the problem Richard frames as: “the difference between a thing and a description of that thing.”

Amazing how much fog of discussion there has been to cover up that amateurish mistake.

The problem isn’t one of conflicting world views (a la Jeni Tennison) but rather how given a bare URI, how to interpret it? Given the bad choices made in the Garden of the Web as it were.

That we simply abandon bare URIs as a solution has never darkened their counsel. They would rather impose the 303/TBL burden on everyone rather than admit to fundamental error.

I have a better solution.

The rest of us should carry on with the identifiers that we want to use, whether they be URIs or not. Whether they are prior identifiers or new ones. And we should put forth statements/standards/documents to establish how in our contexts, those identifiers should be used.

If IBM, Oracle, Microsoft and a few other adventurers decide that IT can benefit from some standard terminology, I am sure they can influence others to use it. Whether composed of URIs or not. And the same can be said for many other domains, most of who will do far better than the W3C at fashioning identifiers for themselves.

Take heart TAG and LOD advocates.

As the poem says: “Give me your tired, your poor, your huddled identifiers yearning to be used.”

Someday your identifiers will be preserved as well.

April 5, 2012

All Aboard for Quasi-Productive Stemming

Filed under: RDF,Semantic Web — Patrick Durusau @ 3:35 pm

All Aboard for Quasi-Productive Stemming by Bob Carpenter.

From the post:

One of the words Becky and I are having annotated for word sense (collecting 25 non-spam Mechanical Turk responses per word) is the nominal (noun) use of “board”.

One of the examples was drawn from a text with a typo where “aboard” was broken into two words, “a board”. I looked at the example, and being a huge fan of nautical fiction, said “board is very productive — we should have the nautical sense”. Then I thought a bit longer and had to admit I didn’t know what “board” meant all by itself. I did know a whole bunch of terms that involved “board” as a stem:

Highly entertaining post by Bob on the meanings of “board.”

I have a question: Which sense of board gets the URL: http://w3.org/people/TBL/OneWorldMeaning/board?

Just curious.

April 4, 2012

Linked Data Basic Profile 1.0

Filed under: Linked Data,LOD,RDF,Semantic Web — Patrick Durusau @ 3:33 pm

Linked Data Basic Profile 1.0

A group of W3C members, IBM, DERI, EMC, Oracle, Red Hat, Tasktop and SemanticWeb.com have made a submission to the W3C with the title: Linked Data Basic Profile 1.0.

The submission consists of:

Linked Data Basic Profile 1.0

Linked Data Basic Profile Use Cases and Requirements

Linked Data Basic Profile RDF Schema

Interesting proposal.

Doesn’t try to do everything. The old 303/TBL is relegated to pagination. Probably a good use for it.

Comments?

New Paper: Linked Data Strategy for Global Identity

Filed under: Identity,RDF,Semantic Web — Patrick Durusau @ 3:32 pm

New Paper: Linked Data Strategy for Global Identity

Angela Guess writes:

Hugh Glaser and Harry Halpin have published a new PhD thesis for the University of Southampton Research Repository entitled “The Linked Data Strategy for Global Identity” (2012). The paper was published by the IEEE Computer Society. It is available for download here for non-commercial research purposes only. The abstract states, “The Web’s promise for planet-scale data integration depends on solving the thorny problem of identity: given one or more possible identifiers, how can we determine whether they refer to the same or different things? Here, the authors discuss various ways to deal with the identity problem in the context of linked data.”

At first I was hurt that I didn’t see a copy of Harry’s dissertation before it was published. I don’t always agree with him (see below) but I do like keeping up with his writing.

Then I discovered this is a four page dissertation. I guess Angela never got past the cover page. It is an article in the IEEE zine, IEEE Internet Computing.

Harry fails to mention that the HTTP 303 “trick,” was made necessary by Tim Berners-Lee’s failure to understand the necessity to distinguish identifiers from addresses. Rather that admit to or correct that failure, the solution being pushed is to create web traffic overhead in the form of 303 “tricks.” “303” should be re-named, “TBL”, so we are reminded with each invocation who made it necessary. (lower middle column, page 3)

I partially agree with:

We’re only just beginning to explore the vast field of identity, and more work is needed before linked data can fulfill its full potential.(on page 5)

The “just beginning” part is true enough. But therein lies the rub. Rather than explore the “…vast field of identity…” which changes from domain to domain, first and then propose a solution, the Linked Data proponents took the other path.

They proposed a solution and in the face of its failure to work, now are inching towards the “…vast field of identity….” Seems a might late for that.

Harry concludes:

The entire bet of the linked data enterprise critically rests on using URIs to create identities for everything. Whether this succeeds might very well determine whether information integration will be trapped in centralized proprietary databases or integrated globally in a decentralized manner with open standards. Given the tremendous amount of data being created and the Web’s ubiquitous nature, URIs and equivalence links might be the best chance we have of solving the identity problem, transforming a profoundly difficult philosophical issue into a concrete engineering project.

The first line, “The entire bet….” omits to say that we need the same URIs for everything. That is called the perfect language project, which has a very long history of consistent failure. Recent attempts include Esperanto and LogLang.

The second line, “Whether this succeeds…trapped in centralized proprietary databases…” is fear mongering. “If you don’t support linked data, (insert your nightmare scenario).”

The final line, “…transforming a profoundly difficult philosophical issue into a concrete engineering project” is magical thinking.

Identity is a very troubled philosophical issue but proposing a solution without understanding the problem doesn’t sound like a high percentage shot to me. You?

The Problem With Names (and the W3C)

Filed under: RDF,Semantic Web — Patrick Durusau @ 3:30 pm

The Problem With Names by Paul Miller.

Paul details the struggle of museums to make their holdings web accessible.

The problem isn’t reluctance or a host of other issues that Paul points out.

The problem is one of identifiers, that is, names.

Museums have crafted complex identifiers for their holdings and not unreasonably expect to continue to use them.

But all they are being offered are links.

The Rijksmuseum is one of several museums around the world that is actively and enthusiastically working to open up its data, so that it may be used, enjoyed, and enriched by a whole new audience. But until some of the core infrastructure — the names, the identifiers, the terminologies, and the concepts — upon which this and other museums depend becomes truly part of the web, far too much of the opportunity created by big data releases such as the Rijksmuseum’s will be wasted.

When is the W3C going to admit that subjects can have complex names/identifiers? Not just simple links?

That would be a game changer. For everyone.

March 23, 2012

A new RDFa Test Harness

Filed under: RDFa,Semantic Web — Patrick Durusau @ 7:24 pm

A new RDFa Test Harness by Gregg Kellogg.

From the post:

This is an introductory blog post on the creation of a new RDFa Test Suite. Here we discuss the use of Sinatra, Backbone.js and Bootstrap.js to run the test suite. Later will come articles on the usefulness of JSON-LD as a means of driving a test harness, generating test reports, and the use of BrowserID to deal with Distributed Denial of Service attacks that cropped up overnight.

Interesting but strikes me a formal/syntax validation of the RDFa in question. Useful, but only up to a point. Yes?

Can you point me to an RDFa or RDF test harness that tests the semantic “soundness” of the claims made in RDFa or RDF?

Quite easily may exist and I have just not seen it.

Thanks!

March 22, 2012

VIVO – An interdisciplinary national network

Filed under: Semantic Web,VIVO — Patrick Durusau @ 7:42 pm

VIVO – An interdisciplinary national network

From the “about” page:

VIVO enables the discovery of researchers across institutions. Participants in the network include institutions with local installations of VIVO or those with research discovery and profiling applications that can provide semantic web!-compliant data. The information accessible through VIVO’s search and browse capability will reside and be controlled locally, within institutional VIVOs or other semantic web-compliant applications.

VIVO is an open source semantic web application originally developed and implemented at Cornell. When installed and populated with researcher interests, activities, and accomplishments, it enables the discovery of research and scholarship across disciplines at that institution and beyond. VIVO supports browsing and a search function which returns faceted results for rapid retrieval of desired information. Content in any local VIVO installation may be maintained manually, brought into VIVO in automated ways from local systems of record, such as HR, grants, course, and faculty activity databases, or from database providers such as publication aggregators and funding agencies.

The rich semantically structured data in VIVO support and facilitate research discovery. Examples of applications that consume these rich data include: visualizations, enhanced multi-site search through VIVO Search, and applications such as VIVO Searchlight, a browser bookmarklet which uses text content of any webpage to search for relevant VIVO profiles, and the Inter-Institutional Collaboration Explorer, an application which allows visualization of collaborative institutional partners, among others.

Download the VIVO flyer.

Would be very interested to hear from adopters outside of the current “collaborative institutional partners.”

I don’t doubt that VIVO will prove to be useful but as you know, I am interested in collaborations that lie just beyond the reach of any particular framework.

« Newer PostsOlder Posts »

Powered by WordPress