Archive for the ‘Schema.org’ Category

Black Friday Dreaming with Bob DuCharme

Saturday, November 15th, 2014

Querying aggregated Walmart and BestBuy data with SPARQL by Bob DuCharme.

From the post:

The combination of microdata and schema.org seems to have hit a sweet spot that has helped both to get a lot of traction. I’ve been learning more about microdata recently, but even before I did, I found that the W3C’s Microdata to RDF Distiller written by Ivan Herman would convert microdata stored in web pages into RDF triples, making it possible to query this data with SPARQL. With major retailers such as Walmart and BestBuy making such data available on—as far as I can tell—every single product’s web page, this makes some interesting queries possible to compare prices and other information from the two vendors.

Bob’s use of SPARQL won’t be ready for this coming Black Friday but some Black Friday in the future?

One can imagine “blue light specials” being input by shoppers on location and driving traffic patterns at the larger malls.

Well worth your time to see where Bob was able to get using public tools.

I first saw this in a tweet by Ivan Herman.

Analyzing Schema.org

Thursday, October 23rd, 2014

Analyzing Schema.org by Peter F. Patel-Schneider.

Abstract:

Schema.org is a way to add machine-understandable information to web pages that is processed by the major search engines to improve search performance. The definition of schema.org is provided as a set of web pages plus a partial mapping into RDF triples with unusual properties, and is incomplete in a number of places. This analysis of and formal semantics for schema.org provides a complete basis for a plausible version of what schema.org should be.

Peter’s analysis is summarized when he says:

The lack of a complete definition of schema.org limits the possibility of extracting the correct information from web pages that have schema.org markup.

Ah, yes, “…the correct information from web pages….”

I suspect the lack of semantic precision has powered the success of schema.org. Each user of schema.org markup has their private notion of the meaning of their use of the markup and there is no formal definition to disabuse them of that notion. Not that formal definitions were enough to save owl:sameAs from varying interpretations.

Schema.org empowers varying interpretations without requiring users to ignore OWL or description logic.

For the domains that schema.org covers, eateries, movies, bars, whore houses, etc., the semantic slippage permitted by schema.org lowers the bar to usage of its markup. Which has resulted in its adoption more widely than other proposals.

The lesson of schema.org is the degree of semantic slippage you can tolerate depends upon your domain. For pharmaceuticals, I would assume that degree of slippage is as close to zero as possible. For movie reviews, not so much.

Any effort to impose the same degree of semantic slippage across all domains is doomed to failure.

I first saw this in a tweet by Bob DuCharme.

How To Build Linked Data APIs…

Wednesday, October 15th, 2014

This is the second high signal-to-noise presentation I have seen this week! I am sure that streak won’t last but I will enjoy it as long as it does.

Resources for after you see the presentation: Hydra: Hypermedia-Driven Web APIs, JSON for Linking Data, and, JSON-LD 1.0.

Near the end of the presentation, Marcus quotes Phil Archer, W3C Data Activity Lead:

Archer on Semantic Web

Which is an odd statement considering that JSON-LD 1.0 Section 7 Data Model, reads in part:

JSON-LD is a serialization format for Linked Data based on JSON. It is therefore important to distinguish between the syntax, which is defined by JSON in [RFC4627], and the data model which is an extension of the RDF data model [RDF11-CONCEPTS]. The precise details of how JSON-LD relates to the RDF data model are given in section 9. Relationship to RDF.

And section 9. Relationship to RDF reads in part:

JSON-LD is a concrete RDF syntax as described in [RDF11-CONCEPTS]. Hence, a JSON-LD document is both an RDF document and a JSON document and correspondingly represents an instance of an RDF data model. However, JSON-LD also extends the RDF data model to optionally allow JSON-LD to serialize Generalized RDF Datasets. The JSON-LD extensions to the RDF data model are:…

Is JSON-LD “…a concrete RDF syntax…” where you can ignore RDF?

Not that I was ever a fan of RDF but standards should be fish or fowl and not attempt to be something in between.

Want to see how #SchemaOrg #Dbpedia and #SKOS taxonomies can be seamlessly integrated?

Friday, September 12th, 2014

Want to see how #SchemaOrg #Dbpedia and #SKOS taxonomies can be seamlessly integrated? Register for our webinar: http://www.poolparty.biz/webinar-taxonomy-management-content-management-well-integrated/

is how the tweet read.

From the seminar registration page:

With the arrival of semantic web standards and linked data technologies, new options for smarter content management and semantic search have become available. Taxonomies and metadata management shall play a central role in your content management system: By combining text mining algorithms with taxonomies and knowledge graphs from the web a more accurate annotation and categorization of documents and more complex queries over text-oriented repositories like SharePoint, Drupal, or Confluence are now possible.

Nevertheless, the predominant opinion that taxonomy management is a tedious process currently impedes a widespread implementation of professional metadata strategies.

In this webinar, key people from the Semantic Web Company will describe how content management and collaboration systems like SharePoint, Drupal or Confluence can benefit from professional taxonomy management. We will also discuss why taxonomy management is not necessarily a tedious process when well integrated into content management workflows.

I’ve had mixed luck with webinars this year. Some were quite good and others were equally bad.

I have fairly firm opinions about #Schema.org, #Dbpedia and #SKOS taxonomies but tedium isn’t one of them. 😉

You can register for free for: Webinar “Taxonomy management & content management – well integrated!”, October 8th, 2014.

Usual marketing harvesting of contact information. Linux users will have to use VMs for PCs or Mac.

If you attend, be sure to look for my post reviewing the webinar and post your comments there.

Announcing Schema.org Actions

Thursday, April 17th, 2014

Announcing Schema.org Actions

From the post:

When we launched schema.org almost 3 years ago, our main focus was on providing vocabularies for describing entities — people, places, movies, restaurants, … But the Web is not just about static descriptions of entities. It is about taking action on these entities — from making a reservation to watching a movie to commenting on a post.

Today, we are excited to start the next chapter of schema.org and structured data on the Web by introducing vocabulary that enables websites to describe the actions they enable and how these actions can be invoked.

The new actions vocabulary is the result of over two years of intense collaboration and debate amongst the schema.org partners and the larger Web community. Many thanks to all those who participated in these discussions, in particular to members of the Web Schemas and Hydra groups at W3C. We are hopeful that these additions to schema.org will help unleash new categories of applications.

From http://schema.org/Action:

Thing > Action

An action performed by a direct agent and indirect participants upon a direct object. Optionally happens at a location with the help of an inanimate instrument. The execution of the action may produce a result. Specific action sub-type documentation specifies the exact expectation of each argument/role.

Fairly coarse but I can see how it would be useful.

BTW, the examples are only available in JSON-LD. Just in case you were wondering.

Given the coarseness of schema.org and its success, due consideration should be given to semantics of “appropriate” coarseness for any particular task.

Is That An “Entity” On Your Webpage?

Sunday, March 30th, 2014

How To Tell Search Engines What “Entities” Are On Your Web Pages by Barbara Starr.

From the post:

Search engines have increasingly been incorporating elements of semantic search to improve some aspect of the search experience — for example, using schema.org markup to create enhanced displays in SERPs (as in Google’s rich snippets).

Elements of semantic search are now present at almost all stages of the search process, and the Semantic Web has played a key role. Read on for more detail and to learn how to take advantage of this opportunity to make your web pages more visible in this evolution of search.

semantic search

The identifications are fairly coarse, that is you get a pointer (URL) that identifies a subject but no idea why someone picked that URL.

But, we all know how well coarse pointers, document level pointers, have worked for the WWW.

Kinda surprising because we have had sub-document indexing for centuries.

Odd how simply pointing to a text blob suddenly became acceptable.

Think of the efforts by Google and schema.org as an attempt to recover indexing as it existed in the centuries before the advent of the WWW.

Vocabularies at W3C

Wednesday, January 8th, 2014

Vocabularies at W3C by Phil Archer.

From the post:

In my opening post on this blog I hinted that another would follow concerning vocabularies. Here it is.

When the Semantic Web first began, the expectation was that people would create their own vocabularies/schemas as required – it was all part of the open world (free love, do what you feel, dude) Zeitgeist. Over time, however, and with the benefit of a large measure of hindsight, it’s become clear that this is not what’s required.

The success of Linked Open Vocabularies as a central information point about vocabularies is symptomatic of a need, or at least a desire, for an authoritative reference point to aid the encoding and publication of data. This need/desire is expressed even more forcefully in the rapid success and adoption of schema.org. The large and growing set of terms in the schema.org namespace includes many established terms defined elsewhere, such as in vCard, FOAF, Good Relations and rNews. I’m delighted that Dan Brickley has indicated that schema.org will reference what one might call ‘source vocabularies’ in the near future, I hope with assertions like owl:equivalentClass, owl:equivalentProperty etc.

Designed and promoted as a means of helping search engines make sense of unstructured data (i.e. text), schema.org terms are being adopted in other contexts, for example in the ADMS. The Data Activity supports the schema.org effort as an important component and we’re delighted that the partners (Google, Microsoft, Yahoo! and Yandex) develop the vocabulary through the Web Schemas Task Force, part of the W3C Semantic Web Interest Group of which Dan Brickley is chair.

Phil then makes a pitch for doing vocabulary work at the W3C but you can see his post for the details.

I think the success of schema.org is a flashing pointer to a semantic sweet spot.

It isn’t nearly everything that you could do with RDF/OWL or with topic maps, but it’s enough to show immediate ROI for a minimum of investment.

Make no mistake, people will develop different vocabularies for the same activities. Not a problem. Topic maps will be able to help you robustly map between different vocabularies.

The Correct End Of Your Telescope – Viewing Schema.org Adoption

Sunday, November 4th, 2012

The Correct End Of Your Telescope – Viewing Schema.org Adoption by Richard Wallis.

telescope graphic

I have been banging on about Schema.org for a while.  For those that have been lurking under a structured data rock for the last year, it is an initiative of cooperation between Google, Bing, Yahoo!, and Yandex to establish a vocabulary for embedding structured data in web pages to describe ‘things’ on the web.  Apart from the simple significance of having those four names in the same sentence as the word cooperation, this initiative is starting to have some impact.  As I reported back in June, the search engines are already seeing some 7%-10% of pages they crawl containing Schema.org markup.  Like it or not, it is clear that Schema.org is rapidly becoming a de facto way of marking up your data if you want it to be shared on the web and have it recognised by the major search engines.

It is no coincidence then, at OCLC we chose Schema.org as the way to expose linked data in WorldCat.  If you haven’t seen it, just search for any item at worldcat.org, scroll to the bottom of the page and open up the Linked Data tab and there you will see the [not very pretty, but hay it’s really designed for systems not humans] Schema.org marked up linked data for the item, with links out to other data sources such as VIAF, LCSH, FAST, and Dewey.

Schema.org has much to recommend itself but I suspect that HTML remains the “…de facto way of marking up your data if you want it to be shared on the web and have it recognised by the major search engines.”

Ten percent is no mean feat but it is still ten percent.

New UMBEL Release Gains schema.org, GeoNames Capabilities

Wednesday, May 23rd, 2012

New UMBEL Release Gains schema.org, GeoNames Capabilities by Mike Bergman.

From the post:

We are pleased to announce the release of version 1.05 of UMBEL, which now has linkages to schema.org [6] and GeoNames [1]. UMBEL has also been split into ‘core’ and ‘geo’ modules. The resulting smaller size of UMBEL ‘core’ — now some 26,000 reference concepts — has also enabled us to create a full visualization of UMBEL’s content graph.

Mapping to schema.org

The first notable change in UMBEL v. 1.05 is its mapping to schema.org. schema.org is a collection of schema (usable as HTML tags) that webmasters can use to markup their pages in ways recognized by major search providers. schema.org was first developed and organized by the major search engines of Bing, Google and Yahoo!; later Yandex joined as a sponsor. Now many groups are supporting schema.org and contributing vocabularies and schema.

You will appreciate the details of the writeup and like the visualization. Quite impressive!

PS: As if you didn’t know:

http://umbel.org/

This is the official Web site for the UMBEL Vocabulary and Reference Concept Ontology (namespace: umbel). UMBEL is the Upper Mapping and Binding Exchange Layer, designed to help content interoperate on the Web.