Semantic Web « Another Word For It

March 13, 2012

W3C HTML Data Task Force Publishes 2 Notes

Filed under: HTML Data,Microdata,RDF,Semantic Web — Patrick Durusau @ 8:16 pm

W3C HTML Data Task Force Publishes 2 Notes

From the post:

The W3C HTML Data Task Force has published two notes, the HTML Data Guide and Microdata to RDF. According to the abstract of the former, ” This guide aims to help publishers and consumers of HTML data use it well. With several syntaxes and vocabularies to choose from, it provides guidance about how to decide which meets the publisher’s or consumer’s needs. It discusses when it is necessary to mix syntaxes and vocabularies and how to publish and consume data that uses multiple formats. It describes how to create vocabularies that can be used in multiple syntaxes and general best practices about the publication and consumption of HTML data.”

One can only hope that the W3C will eventually sanctify industry standard practices for metadata. Perhaps they will call it RDF-NG. Whatever.

Comments Off

Keep the web weird

Filed under: Semantic Web,Semantics — Patrick Durusau @ 8:14 pm

Keep the web weird

Pete Warden writes:

I’m doing a short talk at SXSW tomorrow, as part of a panel on Creating the Internet of Entities. Preparing is tough because don’t I believe it’s possible, and even if it was I wouldn’t like it. Opposing better semantic tagging feels like hating on Girl Scout cookies, but I’ve realized that I like an internet full of messy, redundant, ambiguous data.

The stated goal of an Internet of Entities is a web where “real-world people, places, and things can be referenced unambiguously“. We already have that. Most pages give enough context and attributes for a person to figure out which real world entity it’s talking about. What the definition is trying to get at is a reference that a machine can understand.

The implicit goal of this and similar initiatives like Stephen Wolfram’s .data proposal is to make a web that’s more computable. Right now, the pages that make up the web are a soup of human-readable text, a long way from the structured numbers and canonical identifiers that programs need to calculate with. I often feel frustrated as I try to divine answers from chaotic, unstructured text, but I’ve also learned to appreciate the advantages of the current state of things.

Now there is a position that merits cheerful support!

You need to read what comes in between but Pete concludes:

The web is literature; sprawling, ambiguous, contradictory, and weird. Let’s preserve those as virtues, and write better code to cope with the resulting mess.

I remember Bible society staffers who were concerned that if non-professionals could publish their own annotations attached to the biblical text, that the text might suffer as a result. I tried to assume them that despite years, centuries, if not longer, of massed lay and professional effort, the biblical text has resisted all attempts to tame it. I see no reason to think that will change now or in the future.

Comments Off

March 3, 2012

Bipartite Graphs as Intermediate Model for RDF

Filed under: Hypergraphs,RDF,Semantic Web — Patrick Durusau @ 7:28 pm

Bipartite Graphs as Intermediate Model for RDF by Jonathan Hayes and Claudio Gutierrez.

Abstract:

RDF Graphs are sets of assertions in the form of subject-predicate-object triples of information resources. Although for simple examples they can be understood intuitively as directed labeled graphs, this representation does not scale well for more complex cases, particularly regarding the central notion of connectivity of resources.

We argue in this paper that there is need for an intermediate representation of RDF to enable the application of well-established methods from Graph Theory. We introduce the concept of Bipartite Statement-Value Graph and show its advantages as intermediate model between the abstract triple syntax and data structures used by applications. In the light of this model we explore issues like transformation costs, data/schema structure, the notion of connectivity, and database mappings.

A quite different take on the representation of RDF than in Is That A Graph In Your Cray? Here we encounter hypergraphs for modeling RDF.

Suggestions on how to rank graph representations of RDF?

Or perhaps better, suggestion on how to rank graph representations for use cases?

Putting the question of what (connections/properties) we want to model before the question of how (RDF, etc.) we intend to model it.

Isn’t that the right order?

Comments?

Comments Off

Populating the Semantic Web…

Filed under: Data Mining,Entity Extraction,Entity Resolution,RDF,Semantic Web — Patrick Durusau @ 7:28 pm

Populating the Semantic Web – Combining Text and Relational Databases as RDF Graphs by Kate Bryne.

I ran across this while looking for RDF graph material today. Delighted to find someone interested in the problem of what do we do with existing data, even if new data is in some semantic web format?

Abstract:

The Semantic Web promises a way of linking distributed information at a granular level by interconnecting compact data items instead of complete HTML pages. New data is gradually being added to the SemanticWeb but there is a need to incorporate existing knowledge. This thesis explores ways to convert a coherent body of information from various structured and unstructured formats into the necessary graph form. The transformation work crosses several currently active disciplines, and there are further research questions that can be addressed once the graph has been built.

Hybrid databases, such as the cultural heritage one used here, consist of structured relational tables associated with free text documents. Access to the data is hampered by complex schemas, confusing terminology and difficulties in searching the text effectively. This thesis describes how hybrid data can be unified by assembly into a graph. A major component task is the conversion of relational database content to RDF. This is an active research field, to which this work contributes by examining weaknesses in some existing methods and proposing alternatives.

The next significant element of the work is an attempt to extract structure automatically from English text using natural language processing methods. The first claim made is that the semantic content of the text documents can be adequately captured as a set of binary relations forming a directed graph. It is shown that the data can then be grounded using existing domain thesauri, by building an upper ontology structure from these. A schema for cultural heritage data is proposed, intended to be generic for that domain and as compact as possible.

Another hypothesis is that use of a graph will assist retrieval. The structure is uniform and very simple, and the graph can be queried even if the predicates (or edge labels) are unknown. Additional benefits of the graph structure are examined, such as using path length between nodes as a measure of relatedness (unavailable in a relational database where there is no equivalent concept of locality), and building information summaries by grouping the attributes of nodes that share predicates.

These claims are tested by comparing queries across the original and the new data structures. The graph must be able to answer correctly queries that the original database dealt with, and should also demonstrate valid answers to queries that could not previously be answered or where the results were incomplete.

This will take some time to read but it looks quite enjoyable.

Comments Off

Is That A Graph In Your Cray?

Filed under: Cray,Graphs,Neo4j,RDF,Semantic Web — Patrick Durusau @ 7:27 pm

If you want more information about graph processing in Cray’s uRIKA (I did), try: High-performance Computing Applied to Semantic Databases by Eric L. Goodman, Edward Jimenez, David Mizell, Sinan al-Saffar, Bob Adolf, and David Haglin.

Abstract:

To-date, the application of high-performance computing resources to Semantic Web data has largely focused on commodity hardware and distributed memory platforms. In this paper we make the case that more specialized hardware can offer superior scaling and close to an order of magnitude improvement in performance. In particular we examine the Cray XMT. Its key characteristics, a large, global shared memory, and processors with a memory-latency tolerant design, offer an environment conducive to programming for the Semantic Web and have engendered results that far surpass current state of the art. We examine three fundamental pieces requisite for a fully functioning semantic database: dictionary encoding, RDFS inference, and query processing. We show scaling up to 512 processors (the largest configuration we had available), and the ability to process 20 billion triples completely in memory.

Unusual to see someone apologize for only having “…512 processors (the largest configuration we had available)….,” but that isn’t why I am citing the paper.

The “dictionary encoding” (think indexing) techniques may prove instructive, even if you don’t have time on a Cray XMT. The techniques presented achieve a compression of the raw data between 3.2. and 4.4.

Take special note of the statement: “To simplify the discussion, we consider only semantic web data represented in N-Triples.” Actually the system presented processes only subject, edge, object triples. Unlike Neo4j, for instance, it isn’t a generalized graph engine.

Specialized hardware/software is great but let’s be clear about that upfront. You may need more than RDF graphs can offer. Like edges with properties.

Other specializations include, a process of “closure” has several simplifications to enable a single pass through the RDFS rule set and querying doesn’t allow a variable in the predicate position.

Granting that this results in a hardware/software combination that can claim “interactivity” on large data sets, but what is the cost of making that a requirement?

Take the best known “connect the dots” problem of this century, 9/11. Analysts did not need “interactivity” with large data sets measured in nano-seconds. Batch processing that lasted for a week or more would have been more than sufficient. Most of the information that was known was “known” by various parties for months.

More than that, the amount of relevant was quite small when compared to the “Semantic Web.” There were known suspects (as there are now), with known associates, with known travel patterns, so eliminating all the business/frequent flyers from travel data is a one time filter, plus any > 40 females traveling on US passports (grandmothers). Similar criteria can reduce information clutter, allowing analysts to focus on important data, as opposing to paging through “hits” in a simulation of useful activity.

I would put batch processing of graphs of relevant information against interactive churning of big data in a restricted graph model any day. How about you?

Comments (2)

March 2, 2012

Breaking into the NoSQL Conversation

Filed under: NoSQL,RDF,Semantic Web — Patrick Durusau @ 8:05 pm

Breaking into the NoSQL Conversation by Rob Gonzalez.

Semantic Web Community: I’m disappointed in us! Or at least in our group marketing prowess. We have been failing to capitalize on two major trends that everyone has been talking about and that are directly addressable by Semantic Web technologies! For shame.

I’m talking of course about Big Data and NoSQL. Given that I’ve already given my take on how Semantic Web technology can help with the Big Data problem on SemanticWeb.com, this time around I’ll tackle NoSQL and the Semantic Web.

After all, we gave up SQL more than a decade ago. We should be part of the discussion. Heck, even the XQuery guys got in on the action early!

(much content omitted, read at your leisure)

AllegroGraph, Virtuoso, and Systap can all scale, and can all shard like Mongo. We have more mature, feature rich, and robust APIs via Sesame and others to interact with the data in these stores. So why aren’t we in the conversation? Is there something really obvious that I’m missing?

Let’s make it happen. For more than a decade our community has had a vision for how to build a better web. In the past, traditional tools and inertia have kept developers from trying new databases. Today, there are no rules. It’s high time we stepped it up. On the web we can compete with MongoDB directly on those use cases. In the enterprise we can combine the best of SQL and NoSQL for a new class of flexible, robust data management tools. The conversation should not continue to move so quickly without our voice.

I hate to disappoint but the reason the conversation is moving so quickly is the absence of the Semantic Web voice.

Consider my post earlier today about the new hardware/software release by Cray, A Computer More Powerful Than Watson. The release refers to RDF as a “graph format.”

With good reason. The uRIKA system doesn’t use RDF for reasoning at all. It materializes all the implied nodes and searches the materialized graph. Impressive numbers but reasoning it isn’t.

Inertia did not stop developers from trying new databases. New databases that met no viable (commercially that is) use cases went unused. What’s so hard to understand about that?

Comments Off

February 26, 2012

Where to Publish and Find Ontologies? A Survey of Ontology Libraries

Filed under: Interoperability,Ontology,Semantic Colonialism,Semantic Web — Patrick Durusau @ 8:27 pm

Where to Publish and Find Ontologies? A Survey of Ontology Libraries by Natasha F. Noy and Mathieu d’Aquin.

Abstract:

One of the key promises of the Semantic Web is its potential to enable and facilitate data interoperability. The ability of data providers and application developers to share and reuse ontologies is a critical component of this data interoperability: if different applications and data sources use the same set of well defined terms for describing their domain and data, it will be much easier for them to “talk” to one another. Ontology libraries are the systems that collect ontologies from different sources and facilitate the tasks of finding, exploring, and using these ontologies. Thus ontology libraries can serve as a link in enabling diverse users and applications to discover, evaluate, use, and publish ontologies. In this paper, we provide a survey of the growing—and surprisingly diverse—landscape of ontology libraries. We highlight how the varying scope and intended use of the libraries affects their features, content, and potential exploitation in applications. From reviewing eleven ontology libraries, we identify a core set of questions that ontology practitioners and users should consider in choosing an ontology library for finding ontologies or publishing their own. We also discuss the research challenges that emerge from this survey, for the developers of ontology libraries to address.

Speaking of semantic colonialism, this survey is an accounting of the continuing failure of that program. The examples cited as “ontology libraries” are for the most part not interoperable with each other.

Not that I disagree that having greater data interoperability would be a bad thing, it would be a very good thing, for some issues. The problem, as I see it, is the fixation of the Semantic Web community on a winner-takes-all model of semantics. Could well be, (warning, heresy ahead) that RDF and OWL aren’t the most effective ways to represent or “reason” about data. Just saying, no proof, formal or otherwise to be offered.

And certainly there is a lack of data written using RDF (or even linked data) or annotated using OWL. I don’t think there is a good estimate of all available data so it is difficult to give a good figure for exactly how little of the overall amount of data that is in all the Semantic Web formats.

Any new format will only be applied to the creation of new data so that will leave us with the ever increasing mountains of legacy data which lack the new format.

Rather than seeking to reduce semantic diversity, what appears to be a losing bet, we should explore mechanisms to manage semantic diversity.

Comments Off

Semantic Colonialism

Filed under: Semantic Colonialism,Semantic Web — Patrick Durusau @ 8:27 pm

Here is a good example of semantic colonialism, UM Linguist Studies the Anumeric Language of an Amazonian Tribe. Not obvious from the title is it?

Two studies of the Piraha people of the Amazon, who lack words for numbers, produced different results when they were tested with simple numeric problems with more than three items. One set of results said they could perform them, the other, not.

The explanation for the difference?

The study provides a simple explanation for the controversy. Unbeknown to other researchers, the villagers that participated in one of the previous studies had received basic numerical training by Keren Madora, an American missionary that has worked with the indigenous people of the Amazon for 33 years, and co-author of this study. “Her knowledge of what had happened in that village was crucial. I understood then why they got the results that they did,” Everett says.

Madora used the Piraha language to create number words. For instance she used the words “all the sons of the hand,” to indicate the number four. The introduction of number words into the village provides a reasonable explanation for the disagreement in the previous studies.

If you think that the Piraha are “better off” having number words, put yourself down as a semantic colonialist.

You will have no reason to complain when terms used by Amazon, Google, Nike, Starbucks, etc., start to displace your native terminology.

Even less reason to complain if some Semantic Web ontology displace yours in the race to become the common ontology for some subject area.

After all, one semantic colonialist is much like any other. (Ask any former/current colony if you don’t believe me.)

Comments Off

February 15, 2012

OWL: Yet to Arrive on the Web of Data?

Filed under: Linked Data,OWL,Semantic Web — Patrick Durusau @ 8:33 pm

OWL: Yet to Arrive on the Web of Data? by Angela Guess.

From the post:

A new paper is currently available for download entitled OWL: Yet to arrive on the Web of Data? The paper was written by Birte Glimm, Aidan Hogan, Markus Krötzsch, and Axel Polleres. The abstract states, “Seven years on from OWL becoming a W3C recommendation, and two years on from the more recent OWL 2 W3C recommendation, OWL has still experienced only patchy uptake on the Web. Although certain OWL features (like owl:sameAs) are very popular, other features of OWL are largely neglected by publishers in the Linked Data world.”

It continues, “This may suggest that despite the promise of easy implementations and the proposal of tractable profiles suggested in OWL’s second version, there is still no “right” standard fragment for the Linked Data community. In this paper, we (1) analyse uptake of OWL on the Web of Data, (2) gain insights into the OWL fragment that is actually used/usable on the Web, where we arrive at the conclusion that this fragment is likely to be a simplified profile based on OWL RL, (3) propose and discuss such a new fragment, which we call OWL LD (for Linked Data).”

Interesting and perhaps valuable data about the use of RDFS/OWL primitives on the Web.

I find it curious that the authors don’t survey users about what OWL capabilities they would find compelling. It could be that users are interested in and willing to support some subset of OWL that hasn’t been considered by the authors or others.

Might not be the Semantic Web as the authors envision it, but without broad user support, the author’s Semantic Web will never come to pass.

Comments Off

February 6, 2012

Wikimeta Project’s Evolution…

Filed under: Annotation,Data Mining,Semantic Annotation,Semantic Web — Patrick Durusau @ 6:58 pm

Wikimeta Project’s Evolution Includes Commercial Ambitions and Focus On Text-Mining, Semantic Annotation Robustness by Jennifer Zaino.

From the post:

Wikimeta, the semantic tagging and annotation architecture for incorporating semantic knowledge within documents, websites, content management systems, blogs and applications, this month is incorporating itself as a company called Wikimeta Technologies. Wikimeta, which has a heritage linked with the NLGbAse project, last year was provided as its own web service.

The Semantic Web Blog interviews Dr. Eric Charton about Wikimeta and its future plans.

More interesting that the average interview piece. I have a weakness for academic projects and Wikimeta certainly has the credentials in that regard.

On the other hand, when I read statements like:

So when we said Wikimeta makes over 94 percent of good semantic annotation in the three first ranked suggested annotations, this is tested, evaluated, published, peer-reviewed and reproducible by third parties.

I have to wonder what standard for “…good semantic annotation…” was in play and for what application would 94 percent be acceptable?

Annotation of nuclear power plant documentation? Drug interaction documentation? Jet engine repair manual? Chemical reaction warning on product? None of those sound like 94% right situations.

That isn’t a criticism of this project but of the notion that “correctness” of semantic annotation can be measured separate and apart from some particular use case.

It could be the case that 94% correct is unnecessary if we are talking about the content of Access Hollywood.

And your particular use case may lie somewhere in between those two extremes.

Do read the interview as this sound like it will be an interesting project, whatever your thoughts on “correctness.”

Comments Off

Introduction to: RDFa

Filed under: RDFa,Semantic Web — Patrick Durusau @ 6:58 pm

Introduction to: RDFa by Juan Sequeda.

From the post:

Simply put, RDFa is another syntax for RDF. The interesting aspect of RDFa is that it is embedded in HTML. This means that you can state what things on your HTML page actually mean. For example, you can specify that a certain text is the title of a blog post or it’s the name of a product or it’s the price for a certain product. This is starting to be commonly known as “adding semantic markup”.

Historically, RDFa was specified only for XHTML. Currently, RDFa 1.1 is specified for XHTML and HTML5. Additionally, RDFa 1.1 works for any XML-based language such as SVG. Recently, RDFa Lite was introduced as “a small subset of RDFa consisting of a few attributes that may be applied to most simple to moderate structured data markup tasks.” It is important to note that RDFa is not the only way to add semantics to your webpages. Microdata and Microformats are other options, and I will discuss this later on. As a reminder, you can publish your data as Linked Data through RDFa. Inside your markup, you can link to other URIs or others can link to your HTML+RDFa webpages.

A bit later in the post the author discusses Jenni Tennison’s comparison of RDFa and microformats.

If you are fond of inline markup, which limits you to creating new documents or editing old ones, RDFa or microformats may be of interest.

On the other hand, if you think about transient nodes such as are described in A transient hypergraph-based model for data access, then you have to wonder why you are being limited to new documents or edited old ones?

One assumes that if your application can read a document, you have access to its contents. If you have access to its contents, then a part of that content, either its underlying representation or the content itself, can trigger the creation of a transient node or edge (or permanent ones).

As I will discuss in a post later today, RDF conflates the tasks of identification, assignment of semantics and reasoning (at least). Which may account for it doing all three poorly. (There are other explanations but I am trying to be generous.)

Comments Off

February 4, 2012

Cry Me A River, But First Let’s Agree About What A River Is

Filed under: Ontology,Semantic Diversity,Semantic Web — Patrick Durusau @ 3:34 pm

Cry Me A River, But First Let’s Agree About What A River Is

The post starts off well enough:

How do you define a forest? How about deforestation? It sounds like it would be fairly easy to get agreement on those terms. But beyond the basics – that a definition for the first would reflect that a forest is a place with lots of trees and the second would reflect that it’s a place where there used to be lots of trees – it’s not so simple.

And that has consequences for everything from academic and scientific research to government programs. As explained by Krzysztof Janowicz, perfectly valid definitions for these and other geographic terms exist by the hundreds, in legal texts and government documents and elsewhere, and most of them don’t agree with each other. So, how can one draw good conclusions or make important decisions when the data informing those is all over the map, so to speak.

….

Having enough data isn’t the problem – there’s official data from the government, volunteer data, private organization data, and so on – but if you want to do a SPARQL query of it to discover all towns in the U.S., you’re going to wind up with results that include the places in Utah with populations of less than 5,000, and Los Angeles too – since California legally defines cities and towns as the same thing.

“So this clearly blows up your data, because your analysis is you thinking that you are looking at small rural places,” he says.

This Big Data challenge is not a new problem for the geographic-information sciences community. But it is one that’s getting even more complicated, given the tremendous influx of more and more data from more and more sources: Satellite data, rich data in the form of audio and video, smart sensor network data, volunteer location data from efforts like the Citizen Science Project and services like Facebook Places and Foursquare. “The heterogeneity of data is still increasing. Semantic web tools would help you if you had the ontologies but we don’t have them,” he says. People have been trying to build top-level global ontologies for a couple of decades, but that approach hasn’t yet paid off, he thinks. There needs to be more of a bottom-up take: “The biggest challenge from my perspective is coming up with the rules systems and ontologies from the data.”

All true, many of which objectors to the current Semantic Web approach have been saying for a very long time.

I am not sure about the line: “The heterogeneity of data is still increasing.”

In part because I don’t know of any reliable measure of heterogeneity by which a comparison could be made. True there is more data now than at some X point in the past, but that isn’t necessarily an indication of increased heterogeneity. But that is a minor point.

More serious is the a miracle occurs statement that follows:

How to do it, he thinks, is to make very small and really local ontologies directly mined with the help of data mining or machine learning techniques, and then interlink them and use new kinds of reasoning to see how to reason in the presence of inconsistencies. “That approach is local ontologies that arrive from real application needs,” he says. “So we need ontologies and semantic web reasoning to have neater data that is human and also machine readable. And more effective querying based on analogy or similarity reasoning to find data sets that are relevant to our work and exclude data that may use the same terms but has different ontological assumptions underlying it.”

Doesn’t that have the same feel as the original Semantic Web proposals that were going to eliminate semantic ambiguity from the top down? The very approach that is panned in this article?

And “new kinds of reasoning,” ones I assume have not been invented yet, are going “to reason in the presence of inconsistencies.” And excluding data that “…has different ontological assumptions underlying it.”

Since we are the source of ontological assumptions that underlie the use of terms, I am real curious about how those assumptions are going to become available to these to be invented reasoning techniques?

Oh, that’s right, we are all going to specify our ontological assumptions at the bottom to percolate up. Except that to be useful for machine reasoning, they will have to be as crude as the ones that were going to be imposed from the top down.

I wonder why the indeterminate nature of semantics continues to elude Semantic Web researchers. A snapshot of semantics today may be slightly incorrect tomorrow, probably incorrect in some respect in a month and almost surely incorrect in a year or more.

Take Saddam Hussein for example. One time friend and confidant of Donald Rumsfeld (there are pictures). But over time those semantics changed, largely because Hussein slipped the lease and was no longer a proper vassal to the US. Suddenly, the weapons of mass destruction, in part nerve gas we caused to be sold to him, became a concern. And so Hussein became an enemy of the US. Same person, same facts. Different semantics.

There are less dramatic examples but you get the idea.

We can capture even changing semantics but we need to decide what semantics we want to capture and at what cost? Perhaps that is a better way to frame my objection to most Semantic Web activities, they are not properly scoped. Yes?

Comments Off

February 2, 2012

Introducing Hypernotation, an alternative to Linked Data

Filed under: Hypernotation,Linked Data,Semantic Web — Patrick Durusau @ 3:49 pm

Introducing Hypernotation, an alternative to Linked Data

A competing notation to Linked Data:

From the post:

URL, URI, IRI, URIref, CURIE, QName, slash URIs, hash URIs, bnodes, information resources, non-information resources, dereferencability, HTTP 303, redirection, content-negotiation, RDF model, RDF syntax, RDFa core, RDFa lite, Microdata, Turtle, N3, RDF/XML, JSON-LD, RDF/JSON…

Want to publish some data? Well, these are some of the things you will have to learn and understand to do so. Is the concept of data really so hard that you can’t publish it without understanding the concepts of information and non-information resources? Do you really need to deal with the HTTP 303 redirection and a number of different syntaxes? It’s just data, damn it!

Really, how have we got to this?

I did a detailed analysis on the problems of Linked Data, but it seems that I missed the most important thing. It’s not about the Web technologies but about economics. The key Linked Data problem is that it holds a monopoly in the market. One can’t compare it to anything else, and thus one can’t be objective about it. There is no competition, and without competition, there is no real progress. Without competition, it’s possible for many odd ideas to survive, such as requiring people to implement HTTP 303 redirection.

As a competitor to Linked Data, this proposal should lead to a re-examination of many of the decisions that have lead to and sustain Linked Data. I say “should,” not that it will lead to such a re-examination. At least not now. Perhaps when the next “universal” semantic syntax comes along.

You may find An example of Hypernotation useful in reading the Hypernotation post.

Comments Off

January 29, 2012

Semantic Enterprise Wiki (SMW)

Filed under: Semantic Web,Wiki — Patrick Durusau @ 9:15 pm

Semantic Enterprise Wiki (SMW)

When I ran across:

Run gardening bots to detect inconsistencies in your wiki and continuously improve the quality of the authored knowledge

I thought of Jack Park and his interest in “knowledge gardening.”

But I don’t think Jack was only interested in weeding but also in cultivating diverse ideas, even if those were inconsistent with existing ideas.

I think of it as being the difference between a vibrant heritage garden versus a mono-culture Monsanto field.

Are you using this software? Thoughts/comments?

I haven’t installed it yet but am interested in the vocabulary and annotation features.

Comments Off

January 25, 2012

Searching and Browsing Linked Data with SWSE: the SemanticWeb Search Engine

Filed under: Linked Data,RDF,Search Engines,Semantic Web — Patrick Durusau @ 3:30 pm

Searching and Browsing Linked Data with SWSE: the SemanticWeb Search Engine by Aidan Hogan, Andreas Harth, Jürgen Umbrich, Sheila Kinsella, Axel Polleres and Stefan Decker.

Abstract:

In this paper, we discuss the architecture and implementation of the SemanticWeb Search Engine (SWSE). Following traditional search engine architecture, SWSE consists of crawling, data enhancing, indexing and a user interface for search, browsing and retrieval of information; unlike traditional search engines, SWSE operates over RDF Web data { loosely also known as Linked Data { which implies unique challenges for the system design, architecture, algorithms, implementation and user interface. In particular, many challenges exist in adopting Semantic Web technologies for Web data: the unique challenges of the Web { in terms of scale, unreliability, inconsistency and noise { are largely overlooked by the current Semantic Web standards. Herein, we describe the current SWSE system, initially detailing the architecture and later elaborating upon the function, design, implementation and performance of each individual component. In so doing, we also give an insight into how current Semantic Web standards can be tailored, in a best-effort manner, for use on Web data. Throughout, we offer evaluation and complementary argumentation to support our design choices, and also offer discussion on future directions and open research questions. Later, we also provide candid discussion relating to the diffculties currently faced in bringing such a search engine into the mainstream, and lessons learnt from roughly six years working on the Semantic Web Search Engine project.

This is the paper that Ivan Herman mentions at Nice reading on Semantic Search.

It covers a lot of ground in fifty-five (55) pages but it doesn’t take long to hit an issue I wanted to ask you about.

At page 2, Google is described as follows:

In the general case, Google is not suitable for complex information gathering tasks requiring aggregation from multiple indexed documents: for such tasks, users must manually aggregate tidbits of pertinent information from various recommended heterogeneous sites, each such site presenting information in its own formatting and using its own navigation system. In effect, Google’s limitations are predicated on the lack of structure in HTML documents, whose machine interpretability is limited to the use of generic markup-tags mainly concerned with document rendering and linking. Although Google arguably makes the best of the limited structure available in such documents, most of the real content is contained in prose text which is inherently diffcult for machines to interpret. Addressing this inherent problem with HTML Web data, the Semantic Web movement provides a stack of technologies for publishing machine-readable data on the Web, the core of the stack being the Resource Description Framework (RDF).

A couple of observations:

Although Google needs no defense from me, I would argue that Google never set itself the task of aggregating information from indexed documents. Historically speaking, IR has always been concerned with returning relevant documents and not returning irrelevant documents.

Second, the lack of structure in HTML documents (although the article mixes in sites with different formatting) is no deterrent to a human reader aggregating “tidbits of pertinent information….” I rather doubt that writing all the documents in valid Springer LaTeX would make that much difference on the “tidbits of pertinent information” score.

This is my first pass through the article and I suspect it will take three or more to become comfortable with it.

Do you agree/disagree that the task of IR is to retrieve documents, not “tidbits of pertinent information?”

Do you agree/disagree that HTML structure (or lack thereof) is that much of an issue for interpretation of document?

Thanks!

Comments Off

Nice reading on Semantic Search

Filed under: Linked Data,RDF,Semantic Web — Patrick Durusau @ 3:25 pm

Nice reading on Semantic Search by Ivan Herman.

From the post:

I had a great time reading a paper on Semantic Search[1]. Although the paper is on the details of a specific Semantic Web search engine (DERI’s SWSE), I was reading it as somebody not really familiar with all the intricate details of such a search engine setup and operation (i.e., I would not dare to give an opinion on whether the choice taken by this group is better or worse than the ones taken by the developers of other engines) and wanting to gain a good image of what is happening in general. And, for that purpose, this paper was really interesting and instructive. It is long (cca. 50 pages), i.e., I did not even try to understand everything at my first reading, but it did give a great overall impression of what is going on.

Interested to hear your take on Ivan’s comments on owl:sameAs.

The semantics of words, terms, ontology classes are not stable over time and/or users. If you doubt that statement, leaf through the Oxford English Dictionary for ten (10) minutes.

Moreover, the only semantics we “see” in words, terms or ontology classes are those we assign them. We can discuss the semantics of Hebrew words in the Dead Sea Scrolls but those are our semantics, not those of the original users of those words. May be close to what they meant, may not. Can’t say for sure because we can’t ask and would lack the context to understand the answer if we could.

Adding more terms to use as supplements to owl:sameAs just increases the chances for variation. And error if anyone is going to enforce their vision of broadMatch on usages of that term by others.

Comments (1)

January 23, 2012

Degrees of Semantic Precision

Filed under: Semantic Web,Semantics — Patrick Durusau @ 7:43 pm

Reading Mike Bergman’s posts on making the Semantic Web work tripped a realization that Linked Data and other Semantic Web proposals are about creating a particular degree semantic precision.

And I suspect that is the key to the lack of adoption of Linked Data, etc.

Think about the levels of semantic precision that you use during the day. With family and children, one level of semantic precision; another level of precision with your co-workers; yet another level as you deal with merchants, public servants and others during the day. And you can switch in conversation from one level to another, such as when your child interrupts a conversation with your spouse.

To say nothing of the levels of semantic precision that vary from occupation to occupation, with ontologists/logicians at the top, followed closely by computer scientists, and then doctors, lawyers, computer programmers and a host of others. All of who also use varying degrees of semantic precision during the course of a day.

We communicate with varying degrees of semantic precision and the “semantic” Web reflects that practice.

I say lower-case “semantic” Web because the web had semantics long before current efforts to prescribe only one level of semantic precision.

Comments Off

January 17, 2012

Why Semantic Web Software Must Be Easy(er) to Use

Filed under: Semantic Web — Patrick Durusau @ 8:23 pm

Why Semantic Web Software Must Be Easy(er) to Use

Lee Feigenbaum of Cambridge Semantics writes:

Over on my personal blog, I’ve written a couple of posts that outline two key thoughts on the transformative effects that Semantic Web technologies can have in the enterprise:

Why Semantic Web Technologies: Are We Asking the Wrong Question? — Semantic Web technologies take enterprise problems that would take too long or too many resources to solve with traditional technologies and make them tractable, so you end up being able to apply technology and automation to far more day-to-day business problems then you could before.

Saving Months, Not Milliseconds: Do More Faster with the Semantic Web — The flexibility of Semantic Web technologies mean that you can save days, weeks, or months of calendar time compared to approaches that use traditional technologies.

There’s a key corrollary of these two observations that you need to keep in mind when building, browsing, or buying Semantic Web software. Semantic Web software must be easy to use.

On the surface, this sounds a bit trite. Surely we should demand that all software be easy to use, right? While ease of use is clearly an important goal in software design in general, I’d argue that it’s absolutely crucial to successfully realizing the value from Semantic Web software….

I think Lee has a point about software, in this case, Semantic Web software, needs to be easy to use.

It isn’t that hard to come up with parallel examples from W3C specs. Take XML for example. Sure, there are DocBook users but compare the number of XML users when you count DocBook users versus XML users when you count users of OpenOffice, LibreOffice, KOffice, MS Word. Several orders of magnitude in favor of the latter. Why? Because it is easier to author XML using better interfaces than exist for DocBook.

Where I disagree with Lee is where he claims:

The point of semantic web tech is not that it’s revolutionary – it’s not cold fusion, interstellar flight, quantum computing – it’s an evolutionary advantage – you could do these projects with traditional techs but they’re just hard enough to be impractical, so IT shops don’t – that’s what’s changing here. Once the technologies and tools are good enough to turn “no-go” into “go”, you can start pulling together the data in your department’s 3 key databases; you can start automating data exchange between your group and a key supply-chain partner; you can start letting your line-of-business managers define their own visualizations, reports, and alerts that change on a daily basis. And when you start solving enough of these sorts of problems, you derive value that can fundamentally affect the way your company does business. (from Asking the Wrong Question)

and

Calendar time is what matters. If my relational database application renders a sales forecast report in 500 milliseconds while my Semantic Web application takes 5 seconds, you might hear people say that the relational approach is 10 times faster than the Semantic Web approach. But if it took six months to design and build the relational solution versus two weeks for the Semantic Web solution, Semantic Sam will be adjusting his supply chain and improving his efficiencies long before Relational Randy has even seen his first report. The Semantic Web lets you do things fast, in calendar time. (from Saving Months, Not Milliseconds)

First, you will notice that Lee doesn’t cite any examples in either case. Which would be the first thing you would expect to see from a marketing document. “Our foobar is quicker, faster, better at X that its competitors.” Even if the test results are cooked, they still give concrete examples.

Second, the truth is for the Semantic Web (original recipe) or Semantic Web (linked data special blend) or topic maps or conceptual graphs or whatever, semantic integration is hard. If it were easy, do you think we would have witnessed the ten year slide from the Scientific American Semantic Web original to the current day, linked data version?

Third, semantic diversity has existed for the length of recorded language, depending on whose estimates you accept, 4,000 to 5,000 years. And there has been no shortage of people with a plan to eliminate semantic diversity all that time. Semantic diversity persists to this day. If people haven’t been able to eliminate semantic diversity in 4,000 to 5,000 years, what chance does an automated abacus have?

Comments (1)

The ClioPatria Semantic Web server

Filed under: Prolog,RDF,Semantic Web — Patrick Durusau @ 8:18 pm

The ClioPatria Semantic Web server

I ran across this whitepaper about the ClioPatria Semantic Web server that reads in part:

What is ClioPatria?

ClioPatria is a (SWI-)Prolog hosted HTTP application-server with libraries for Semantic Web reasoning and a set of JavaScript libraries for presenting results in a browser. Another way to describe ClioPatria is as “Tomcat+Sesame (or Jena) with additional reasoning libraries in Prolog, completed by JavaScript presentation components”.

Why is ClioPatria based on Prolog?

Prolog is a logic-based language using a simple depth-first resolution strategy (SLD resolution). This gives two readings to the same piece of code: the declarative reading and the procedural reading. The declarative reading facilitates understanding of the code and allows for reasoning about it. The procedural reading allows for specifying algorithms and sequential aspects of the code, something which we often need to describe interaction. In addition, Prolog is reflexive: it can reason about Prolog programs and construct them at runtime. Finally, Prolog is, like the RDF triple-model, relational. This match of paradigms avoids the complications involved with using Object Oriented languages for handling RDF (see below). We illustrate the fit between RDF and Prolog by translating an example query from the official SPARQL document:…

Just in case you are interested in RDF or Prolog or both.

Comments Off

January 16, 2012

Introducing Meronymy SPARQL Database Server

Filed under: RDF,Semantic Web,SPARQL — Patrick Durusau @ 2:33 pm

Introducing Meronymy SPARQL Database Server

Inge Henriksen writes:

I am pleased to announce today that the Meronymy SPARQL Database Server is ready for release later in 2012. Meronymy SPARQL Database Server is a high performance RDF Enterprise Database Management System (DBMS).

Our goal has been to make a really fast, ACID, OS portable, user friendly, secure, SPARQL-driven RDF database server usable with most programming languages.

Let’s not start any language wars about Meronymy being written in C++/assembly, , and concentrate on its performance in actual use.

Suggested RDF data sets to use to test that performance? (Knowing Inge I trust it is fast but the question is how fast under what circumstances?)

Or other RDF engines to test along side of it?

PS: If you don’t know SPARQL, check out Learning SPARQL by Bob Ducharme.

Comments Off

January 12, 2012

Call for Papers on Big Data: Theory and Practice

Filed under: BigData,Heterogeneous Data,Semantic Web — Patrick Durusau @ 7:27 pm

SWJ 2012 : Semantic Web Journal Call for Papers on Big Data: Theory and Practice

Dates:

Manuscript submission due: 13. February 2012
First notification: 26. March 2012
Issue publication: Summer 2012

From the post:

The Semantic Web journal calls for innovative and high-quality papers describing theory and practice of storing, accessing, searching, mining, processing, and visualizing big data. We especially invite papers that describe or demonstrate how ontologies, Linked Data, and Semantic Web technologies can handle the problems arising when integrating massive amounts of multi-thematic and multi-perspective information from heterogeneous sources to answer complex questions that cut through domain boundaries.

We welcome all paper categories, i.e., full research papers, application reports, systems and tools, ontology papers, as well as surveys, as long as they clearly relate to challenges and opportunities arising from processing big data – see our listing of paper types in the author guidelines. In other words, we expect all submitted manuscripts to address how the presented work can exploit massive and/or heterogeneous data.

Semantic Web technologies represent subjects as well as being subjects themselves should enable demonstrations of integrating diverse Semantic Web approaches to the same data. Where the underlying data is heterogeneous as well. Now that would be an interesting paper.

Comments Off

January 10, 2012

The Semantic Web & the Right to be Forgotten (there is a business case – read on)

Filed under: Document Retention,Semantic Web — Patrick Durusau @ 8:10 pm

The Semantic Web & the Right to be Forgotten by Angela Guess.

From the post:

Dr. Kieron O’Hara has examined how the semantic web might be used to implement a so-called ‘right to be forgotten.’ O’Hara writes, “During the revision of the EU’s data protection directive, attention has focused on a ‘right to be forgotten’. Though the discussion has been largely confined to the legal profession, and has been overlooked by technologists, it does raise technical issues – UK minister Ed Vaizey, and the UK’s Information Commissioner’s Office have pointed out that rights are only meaningful when they can be enforced and implemented (Out-law.com 2011, ICO 2011). In this article, I look at how such a right might be interpreted and whether it could be enforced using the specific technology of the Semantic Web or the Linked Data Web.”

O’Hara continues, “Currently, the Semantic Web and the Linked Data Web approach access control via licences and waivers. In many cases, those who wish to gain the benefits of linking are keen for their data to be used and linked, and so are happy to invite access. Copyrightable content can be governed by Creative Commons licences, requiring the addition of a single RDF triple to the metadata. With other types of data, controllers use waivers, and for that purpose a waiver vocabulary, http://vocab.org/waiver/terms/.html, has been created.”

The case Dr. O’Hara is concerned with is:

the right of individuals to have their data no longer processed and deleted when they are no longer needed for legitimate purposes. This is the case, for example, when processing is based on the person’s consent and when he or she withdraws consent or when the storage period has expired.

As you would expect, the EU completely overlooks the business case for forgetting, it’s called document retention. Major corporations have established policies for how long materials have to be retained and procedures to be followed for their destruction.

Transpose that into a topic maps or semantic web context where you have links into those materials. Perhaps links that you don’t control.

So, what is your policy about “forgetting” by erasure of links to documents that no longer exist?

Or do you have a policy about the creation of links to documents? And if so, how do you track them? Or even monitor the enforcement of the policy?

It occurs to me that if you used enterprise search software, you could create topics that represent documents that are being linked to by other documents. Topics that could carry the same destruction date information as your other information systems.

Interesting uses suggest themselves. Upon destruction of some documents you could visualize if odd or inconvenient holes in the document record are going to be created by a widely linked record’s destruction.

Depending on the skill of your indexing and document recognition, you could even uncover non-hyperlink references between documents. And perform the same analysis.

Or, you could wait until some enterprising lawyer who isn’t representing your interest decides to perform the same analysis.

Your call.

Comments Off

The (Real) Semantic Web Requires Machine Learning

Filed under: Machine Learning,Semantic Web — Patrick Durusau @ 7:57 pm

The (Real) Semantic Web Requires Machine Learning by John O’Neil.

From the longer quote below:

…different people will almost inevitably create knowledge encodings that can’t easily be compared, because they use different — sometimes subtly, maddeningly different — basic definitions and concepts. Another difficult problem is to decide when entity names refer to the “same” real-world thing. Even worse, if the entity names are defined in two separate places, when and how should they be merged?

And the same is true for relationships between entities.(full stop)

The author thinks statistical analysis will be able to distinguish both entities and relationships between them, which I am sure will be true to some degree.

I would characterize that as a topic map authoring aid but it would also be possible to simply accept the statistical results.

It is refreshing to see someone recognize the “semantic web” is the one created by users and not as dictated by other authorities.

From the post:

We think about the semantic web in two complementary (and equivalent) ways. It can be viewed as:

A large set of subject-verb-object triples, where the verb is a relation and the subject and object are entities

OR

As a large graph or network, where the nodes of the graph are entities and the graph’s directed edges or arrows are the relations between nodes.

As a reminder, entities are proper names, like people, places, companies, and so on. Relations are meaningful events, outcomes or states, like BORN-IN, WORKS-FOR, MARRIED-TO, and so on. Each entity (like “John O’Neil”, “Attivio” or “Newton, MA”) has a type (like “PERSON”, “COMPANY” or “LOCATION”) and each relation is constrained to only accept certain types of entities. For example, WORKS-FOR may require a PERSON as the subject and a COMPANY as the object.

How semantic web information is organized and transmitted is described by a blizzard of technical standards and XML namespaces. Once you escape from that, the basic goals of the semantic web are (1) to allow a lot of useful information about the world to be simply expressed, in a way that (2) allows computers to do useful things with it.

Almost immediately, some problems crop up. As generations of artificial intelligence researchers have learned, it can be really difficult to encode real-world knowledge into predicate logic, which is more-or-less what the semantic web is. The same AI researchers also learned that different people will almost inevitably create knowledge encodings that can’t easily be compared, because they use different — sometimes subtly, maddeningly different — basic definitions and concepts. Another difficult problem is to decide when entity names refer to the “same” real-world thing. Even worse, if the entity names are defined in two separate places, when and how should they be merged? For example, do an Internet search for “John O’Neil”, and try to decide which of the results refer to how many different people. Believe me, all the results are not for the same person.

idata-semantic-web.jpgAs for relations, it’s difficult to tell when they really mean the same thing across different knowledge encodings. No matter how careful you are, if you want to use relations to infer new facts, you have few resources to check to see if the combined information is valid.

So, when each web site can define its own entities and relations, independently of any other web site, how do you reconcile entities and relations defined by different people?

Comments Off

January 6, 2012

I-CHALLENGE 2012 : Linked Data Cup

Filed under: Linked Data,LOD,RDF,Semantic Web — Patrick Durusau @ 11:39 am

I-CHALLENGE 2012 : Linked Data Cup

Dates:

When Sep 5, 2012 – Sep 7, 2012
Where Graz, Austria
Submission Deadline Apr 2, 2012
Notification Due May 7, 2012
Final Version Due Jun 4, 2012

From the call for submissions:

The yearly organised Linked Data Cup (formerly Triplification Challenge) awards prizes to the most promising innovation involving linked data. Four different technological topics are addressed: triplification, interlinking, cleansing, and application mash-ups. The Linked Data Cup invites scientists and practitioners to submit novel and innovative (5 star) linked data sets and applications built on linked data technology.

Although more and more data is triplified and published as RDF and linked data, the question arises how to evaluate the usefulness of such approaches. The Linked Data Cup therefore requires all submissions to include a concrete use case and problem statement alongside a solution (triplified data set, interlinking/cleansing approach, linked data application) that showcases the usefulness of linked data. Submissions that can provide measurable benefits of employing linked data over traditional methods are preferred.
Note that the call is not limited to any domain or target group. We accept submissions ranging from value-added business intelligence use cases to scientific networks to the longest tail of information domains. The only strict requirement is that the employment of linked data is very well motivated and also justified (i.e. we rank approaches higher that provide solutions, which could not have been realised without linked data, even if they lack technical or scientific brilliance). (emphasis added)

I don’t know what the submissions are going to look like but the conference organizers should get high marks for academic honesty. I don’t think I have ever seen anyone say:

we rank approaches higher that provide solutions, which could not have been realised without linked data, even if they lack technical or scientific brilliance

We have all seen challenges with qualifying requirements but I don’t recall any that would privilege lesser work because of a greater dependence on a requirement. Or at least that would publicly claim that was the contest policy. Have there been complaints from technically or scientifically brilliant approaches about judging in the past?

Will have to watch the submissions and results to see if technically or scientifically brilliant approaches get passed over in favor of lesser approaches. Will be a signal to all first rate competitors to seek recognition elsewhere.

Comments Off

December 22, 2011

Drs. Wood & Seuss Explain RDF in Two Minutes

Filed under: RDF,Semantic Web,Semantics — Patrick Durusau @ 7:38 pm

Drs. Wood & Seuss Explain RDF in Two Minutes by Eric Franzon.

From the post:

“How would you explain RDF to my grandmother? I still don’t get it…” a student recently asked of David Wood, CTO of 3Roundstones. Wood was speaking to a class called “Linked Data Ventures” and was made up of students from the MIT Computer Science Department and the Sloan School of Business. He responded by creating a slide deck and subsequent video explaining the Resource Description Framework using the classic Dr. Seuss style of rhyming couplets and the characters Thing 1 and Thing 2.

I hope this student’s grandmother found this as enjoyable as I did. (Video after the jump).

This is a great explanation of RDF. You won’t be authoring RDF after the video but for you will have the basics.

Take this as a goad to come up with something similar for topic maps and other semantic technologies.

Comments Off

December 21, 2011

Semantic Web Technologies and Social Searching for Librarians – No Buy

Filed under: Searching,Semantic Web,Social Media — Patrick Durusau @ 7:26 pm

Semantic Web Technologies and Social Searching for Librarians By Robin Fay and Michael Sauers.

I don’t remember recommending a no buy on any book on this blog, particularly one I haven’t read, but there is a first time for everything.

Yes, I haven’t read the book because it isn’t available yet.

How do I know to recommend no buy on Robin Fay and Michael Sauers’ “Semantic Web Technologies and Social Searching for Librarians”?

Let’s look at the evidence, starting with the overview:

There are trillions of bytes of information within the web, all of it driven by behind-the-scenes data. Vast quantities of information make it hard to find what’s really important. Here’s a practical guide to the future of web-based technology, especially search. It provides the knowledge and skills necessary to implement semantic web technology. You’ll learn how to start and track trends using social media, find hidden content online, and search for reusable online content, crucial skills for those looking to be better searchers. The authors explain how to explore data and statistics through WolframAlpha, create searchable metadata in Flickr, and give meaning to data and information on the web with Google’s Rich Snippets. Let Robin Fay and Michael Sauers show you how to use tools that will awe your users with your new searching skills.

So, having read this book, you will know:

the future of web-based technology, especially search
[the] knowledge and skills necessary to implement semantic web technology
[how to] start and track trends using social media
[how to] find hidden content online
[how to] search for reusable online content
[how to] explore data and statistics through WolframAlpha
[how to] create searchable metadata in Flickr
[how to] give meaning to data and information on the web with Google’s Rich Snippets

The other facts you need to consider?

6 x 9 | 125 pp. | $59.95

So, in 125 pages, call it 105, allowing for title page, table of contents and some sort of index, you are going to learn all those skills?

For about the same amount of money, you can get a copy of Modern information retrieval : the concepts and technology behind search by Ricardo Baeza-Yates; Berthier Ribeiro-Neto, which covers only search in 944 pages.

I read a lot of discussion about teaching students to critically evaluate information that they read on the WWW.

Any institution that buys this book needs to implement critical evaluation of information training for its staff/faculty.

Comments Off

December 17, 2011

SDShare

Filed under: RDF,SDShare,Semantic Web,Semantics — Patrick Durusau @ 7:54 pm

SDShare (PDF file)

According to the blog entry dated 16 December 2011, with a pointer to this presentation, this is a “recent” presentation. But the presentation has a copyright claim dated 2010. So it is either nearly a year old or it is one of those timeless artifacts on the web.

The ones that have no reliable indication of a date of composition or publishing. Appropriate for the ephemera that make up the eternal “now” of the WWW. Less appropriate for important technology documents, particularly ones that aspire to be ISO standards in the near future.

The slide deck is a good overview of the goals of SDShare, if a bit short in terms of the actual details. I would suggest using the slide deck to interest others in learning more and then passing onto them the original SDShare document.

I would quibble with the claim at slide 34 that RDF data makes “…merging simple.” So far as I know, RDF never specifies what happens when you have multiple distinct and perhaps inconsistent values for the same property. Perhaps I have overlooked that in the plethora of RDF standards, revisions and retreats.

Comments Off

December 15, 2011

5 Simple Provenance Statements

Filed under: Metadata,Provenance,Semantic Web,Semantics — Patrick Durusau @ 7:47 pm

5 Simple Provenance Statements

From the webpage:

Providing easily processable information about the provenance or origins of Web pages and data is important. It lets us give credit where its due and it helps others trust the information we publish on the Web.

Here’s some simple provenance statements one can make using PROV-DM, the recently released working draft of a data model for provenance from the W3C.

Evaluate PROV-DM in light of two concerns:

1) Does it allow for the expression of different ways of expressing provenance? Consider the differing museum metadata standards for provenance. As just a tiny corner of that world, see: Introduction to Controlled Vocabularies by Patricia Harpring (online version).

2) On the other hand, is it too restrictive and complex for simple provenance statements by the average user?

Hard to fail by being too general (#1) and being too restrictive (#2) at the same time but odder things have happened in discussions of semantics.

Comments Off

December 8, 2011

RDFa 1.1

Filed under: RDFa,Semantic Web — Patrick Durusau @ 8:01 pm

RDFa 1.1

From the draft:

The last couple of years have witnessed a fascinating evolution: while the Web was initially built predominantly for human consumption, web content is increasingly consumed by machines which expect some amount of structured data. Sites have started to identify a page’s title, content type, and preview image to provide appropriate information in a user’s newsfeed when she clicks the “Like” button. Search engines have started to provide richer search results by extracting fine-grained structured details from the Web pages they crawl. In turn, web publishers are producing increasing amounts of structured data within their Web content to improve their standing with search engines.

A key enabling technology behind these developments is the ability to add structured data to HTML pages directly. RDFa (Resource Description Framework in Attributes) is a technique that allows just that: it provides a set of markup attributes to augment the visual information on the Web with machine-readable hints. In this Primer, we show how to express data using RDFa in HTML, and in particular how to mark up existing human-readable Web page content to express machine-readable data.

This document provides only a Primer to RDFa. The complete specification of RDFa, with further examples, can be found in the RDFa 1.1 Core [RDFA-CORE], the XHTML+RDFa 1.1 [XHTML-RDFA], and the HTML5+RDFa 1.1 [HTML-RDFA] specifications.

I am sure this wasn’t an intentional contrast, but compare this release with that of RDFa Lite 1.1.

Which one would you rather teach a room full of newbie (or even experienced) HTML hackers?

Don’t be shy, keep your hands up!

I don’t know that RDFa Lite 1.1 is “lite” enough but I think it is getting closer to a syntax that might actually be used.

Comments Off

RDFa Lite 1.1

Filed under: RDFa,Semantic Web — Patrick Durusau @ 8:00 pm

RDFa Lite 1.1 (new draft)

From the W3C:

One critique of RDFa is that is has too much functionality, leaving first-time authors confused about the more advanced features. RDFa Lite is a minimalist version of RDFa that helps authors easily jump into the structured data world. The goal was to outline a small subset of RDFa that will work for 80% of the Web authors out there doing simple data markup.

Well, it’s short enough.

Comments are being solicited so here’s your chance.

Still using simple identifiers for subjects, which may be sufficient in some cases. Depends. The bad part is that doesn’t improve as you go up the chain to more complex forms of RDFa/RDF.

BTW, does anyone have a good reference for what it means to have a web of things?

Just curious what is going to be left on the cutting room floor from the Semantic Web and its “web of things?”

Will the Semantic Web be the Advertising Web that pushes content at me, whether I am interested or not?

Comments Off

« Newer Posts — Older Posts »

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 13, 2012

March 3, 2012

March 2, 2012

February 26, 2012

February 15, 2012

February 6, 2012

February 4, 2012

February 2, 2012

January 29, 2012

January 25, 2012

January 23, 2012

January 17, 2012

January 16, 2012

January 12, 2012

January 10, 2012

January 6, 2012

December 22, 2011

December 21, 2011

December 17, 2011

December 15, 2011

December 8, 2011