Provenance « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 20, 2014

PLUS

Filed under: Data,Neo4j,Provenance — Patrick Durusau @ 7:36 pm

From the webpage:

PLUS is a system for capturing and managing provenance information, originally created at the MITRE Corporation.

Data provenance is “information that helps determine the derivation history of a data product…[It includes] the ancestral data product(s) from which this data product evolved, and the process of transformation of these ancestral data product(s).”

Uses Neo4j for storage.

Includes an academic bibliography of related papers.

Provenance answers the question: Where has your data been, what has happened to your data and with who?

Comments Off

January 23, 2014

Provenance Reconstruction Challenge 2014

Filed under: Provenance,Semantic Web,W3C — Patrick Durusau @ 12:06 pm

Provenance Reconstruction Challenge 2014

Schedule

February 17, 2014 Test Data released
May 18, 2014 Last day to register for participation
May 19, 2014 Challenge Data released
June 13, 2014 Provenance Reconstruction Challenge Event at Provenance Week – Cologne Germany

From the post:

While the use of version control systems, workflow engines, provenance aware filesystems and databases, is growing there is still a plethora of data that lacks associated data provenance. To help solve this problem, a number of research groups have been looking at reconstructing the provenance of data using the computational environment in which it resides. This research however is still very new in the community. Thus, the aim the Provenance Reconstruction Challenge is to help spur research into the reconstruction of provenance by providing a common task and datasets for experimentation.

The Challenge

Challenge participants will receive an open data set and corresponding provenance graphs (in W3C PROV formant). They will then have several months to work with the data trying to reconstruct the provenance graphs from the open data set. 3 weeks before the challenge face-2-face event the participants will receive a new data set and a gold standard provenance graph. Participants are asked to register before the challenge dataset is released and to prepare a short description of their system to be placed online after the event.

The Event

At the event, we will have presentations of the results and the systems as well as a group conversation around the techniques used. The event will result in a joint report about techniques for reproducing provenance and paths forward.

For further information on the W3C PROV format:

Provenance Working Group

PROV at Semantic Web Wiki.

PROV Implementation Report (60 implementations as of 30 April 2013)

I first saw this in a tweet by Paul Groth.

Comments Off

April 5, 2013

Successful PROV Tutorial at EDBT

Filed under: Design,Modeling,Provenance — Patrick Durusau @ 1:13 pm

Successful PROV Tutorial at EDBT by Paul Groth.

From the post:

On March 20th, 2013 members of the Provenance Working Group gave a tutorial on the PROV family of specifications at the EDBT conference in Genova, Italy. EDBT (“Extending Database Technology”) is widely regarded as one of the prime venues in Europe for dissemination of data management research.

…

The 1.5 hours tutorial was attended by about 26 participants, mostly from academia. It was structured into three parts of approximately the same length. The first two parts introduced PROV as a relational data model with constraints and inference rules, supported by a (nearly) relational notation (PROV-N). The third part presented known extensions and applications of PROV, based on the extensive PROV implementation report and implementations known to the presenter at the time.

All the presentation material is available here.

As the first part of the tutorial notes:

Provenance is not a new subject

workflow systems

databases

knowledge representation

information retrieval

Existing community-grown vocabularies

Open Provenance Model (OPM)

Dublin Core

Provenir ontology

Provenance vocabulary

SWAN provenance ontology

etc.

The existence of “other” vocabularies isn’t an issue for topic maps.

You can query on “your” vocabulary and obtain results from “other” vocabularies.

Enriches your information and that of others.

You will need to know about the vocabularies of others and their oddities.

For the W3C work on provenance, follow this tutorial and the others it mentions.

Comments Off

March 18, 2013

Dublin Core Mapping Comments [by 7 April 2013]

Filed under: Dublin Core,Provenance — Patrick Durusau @ 4:21 am

Stuart Sutton, Managing Director, DCMI, calls on the Dublin Core community to comment on a mapping from Dublin Core terms to the PROV provenance ontology.

His call reads:

The DCMI Metadata Provenance Task Group [1] is collaborating with the W3C Provenance Working Group [2] on a mapping from Dublin Core terms to the PROV provenance ontology [3], currently a W3C Proposed Recommendation. More precisely, the document describes a partial mapping from DCMI Metadata Terms [4] to the PROV-O OWL2 ontology [5] — a set of classes and properties usable for representing and interchanging information about provenance. Numerous terms in the DCMI vocabulary provide information about the provenance of a resource. Translating these terms into PROV relates this information explicitly to the W3C provenance model.

The mapping is currently a W3C Working Draft. The final state of the document will be that of a W3C Note, to be published as part of a suite of documents in support of a W3C Recommendation for provenance interchange [6].

DCMI would like to point to the W3C Note as a DCMI Recommended Resource and therefore encourages the Dublin Core community to provide feedback and take part in the finalization of the mapping.

The deadline for all comments is 7 April 2013. We recommend that comments be provided directly to the public W3C list for comments: public-prov-comments@w3.org [7], ideally with a Cc: to DCMI’s dc-provenance list [8]. Comments sent only to the dc-provenance list will be summarized on the W3C list and addressed, and discussions on the W3C list will be summarized back on the dc-provenance list when appropriate.

Stuart Sutton, Managing Director, DCMI

[1] http://dublincore.org/groups/provenance/
[2] http://www.w3.org/2011/prov/wiki/Main_Page
[3] http://www.w3.org/TR/2013/WD-prov-dc-20130312/
[4] http://dublincore.org/documents/dcmi-terms/
[5] http://www.w3.org/TR/prov-o/
[6] http://www.w3.org/TR/prov-overview/
[7] http://lists.w3.org/Archives/Public/public-prov-comments/
[8] https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=dc-provenance

Comments Off

March 10, 2013

Why Data Lineage is Your Secret … Weapon [Auditing Topic Maps]

Filed under: Data Quality,Merging,Provenance — Patrick Durusau @ 8:42 pm

Why Data Lineage is Your Secret Data Quality Weapon by Dylan Jones.

From the post:

Data lineage means many things to many people but it essentially refers to provenance – how do you prove where your data comes from?

It’s really a simple exercise. Just pull an imaginary string of data from where the information presents itself, back through the labyrinth of data stores and processing chains, until you can go no further.

I’m constantly amazed by why so few organisations practice sound data lineage management despite having fairly mature data quality or even data governance programs. On a side note, if ever there was a justification for the importance of data lineage management then just take a look at the brand damage caused by the recent European horse meat scandal.

But I digress. Why is data lineage your secret data quality weapon?

The simple answer is that data lineage forces your organisation to address two big issues that become all too apparent:

Lack of ownership

Lack of formal information chain design

Or to put it into a topic map context, can you trace what topics merged to create the topic you are now viewing?

And if you can’t trace, how can you audit the merging of topics?

And if you can’t audit, how do you determine the reliability of your topic map?

That is reliability in terms of date (freshness), source (reliable or not), evaluation (by screeners), comparison (to other sources), etc.

Same questions apply to all data aggregation systems.

Or as Mrs. Weasley tells Ginny:

“Never trust anything that can think for itself if you can’t see where it keeps its brain.”

Correction: Wesley -> Weasley. We had a minister friend over Sunday and were discussing the former, not the latter.

Comments Off

October 3, 2012

At or Near Final Calls on W3C Provenance

Filed under: HTML,Provenance — Patrick Durusau @ 7:48 pm

I saw a notice today about the ontology part of the W3C work on provenance. Some of it is at final call or nearly so. If you are interested, see:

PROV-DM, the PROV data model for provenance;
PROV-CONSTRAINTS, a set of constraints applying to the PROV data model;
PROV-N, a notation for provenance aimed at human consumption;
PROV-O, the PROV ontology, an OWL2 ontology allowing the mapping of PROV to RDF;
PROV-AQ, the mechanisms for accessing and querying provenance;
PROV-PRIMER, a primer for the PROV data model.

My first impression is the provenance work is more complex than HTML 3.2 and therefore unlikely to see widespread adoption. (You may want to bookmark that link. It isn’t listed on the HTML page at the W3C, even under obsolete versions.)

Comments Off

July 26, 2012

How to Track Your Data: Rule-Based Data Provenance Tracing Algorithms

Filed under: Data,Provenance — Patrick Durusau @ 3:43 pm

How to Track Your Data: Rule-Based Data Provenance Tracing Algorithms by Zhang, Qing Olive; Ko, Ryan K L; Kirchberg, Markus; Suen, Chun-Hui; Jagadpramana, Peter; Lee, Bu Sung.

Abstract:

As cloud computing and virtualization technologies become mainstream, the need to be able to track data has grown in importance. Having the ability to track data from its creation to its current state or its end state will enable the full transparency and accountability in cloud computing environments. In this paper, we showcase a novel technique for tracking end-to-end data provenance, a meta-data describing the derivation history of data. This breakthrough is crucial as it enhances trust and security for complex computer systems and communication networks. By analyzing and utilizing provenance, it is possible to detect various data leakage threats and alert data administrators and owners; thereby addressing the increasing needs of trust and security for customers’ data. We also present our rule-based data provenance tracing algorithms, which trace data provenance to detect actual operations that have been performed on files, especially those under the threat of leaking customers’ data. We implemented the cloud data provenance algorithms into an existing software with a rule correlation engine, show the performance of the algorithms in detecting various data leakage threats, and discuss technically its capabilities and limitations.

Interesting work but data provenance isn’t solely a cloud computing, virtualization issue.

Consider the ongoing complaints in Washington, D.C. on who leaked what to who and why?

All posturing to one side, that is a data provenance and subject identity based issue.

The sort of thing where a topic map application could excel.

Comments Off

December 15, 2011

5 Simple Provenance Statements

Filed under: Metadata,Provenance,Semantic Web,Semantics — Patrick Durusau @ 7:47 pm

5 Simple Provenance Statements

From the webpage:

Providing easily processable information about the provenance or origins of Web pages and data is important. It lets us give credit where its due and it helps others trust the information we publish on the Web.

Here’s some simple provenance statements one can make using PROV-DM, the recently released working draft of a data model for provenance from the W3C.

Evaluate PROV-DM in light of two concerns:

1) Does it allow for the expression of different ways of expressing provenance? Consider the differing museum metadata standards for provenance. As just a tiny corner of that world, see: Introduction to Controlled Vocabularies by Patricia Harpring (online version).

2) On the other hand, is it too restrictive and complex for simple provenance statements by the average user?

Hard to fail by being too general (#1) and being too restrictive (#2) at the same time but odder things have happened in discussions of semantics.

Comments Off