Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 16, 2012

Identifying And Weighting Integration Hypotheses On Open Data Platforms

Filed under: Crowd Sourcing,Data Integration,Integration,Open Data — Patrick Durusau @ 12:58 pm

Identifying And Weighting Integration Hypotheses On Open Data Platforms by Julian Eberius, Katrin Braunschweig, Maik Thiele, and Wolfgang Lehner.

Abstract:

Open data platforms such as data.gov or opendata.socrata.com provide a huge amount of valuable information. Their free-for-all nature, the lack of publishing standards and the multitude of domains and authors represented on these platforms lead to new integration and standardization problems. At the same time, crowd-based data integration techniques are emerging as new way of dealing with these problems. However, these methods still require input in form of specific questions or tasks that can be passed to the crowd. This paper discusses integration problems on Open Data Platforms, and proposes a method for identifying and ranking integration hypotheses in this context. We will evaluate our findings by conducting a comprehensive evaluation using on one of the largest Open Data platforms.

This is interesting work on Open Data platforms but it is marred by claims such as:

Open Data Platforms have some unique integration problems that do not appear in classical integration scenarios and which can only be identi ed using a global view on the level of datasets. These problems include partial- or duplicated datasets, partitioned datasets, versioned datasets and others, which will be described in detail in Section 4.

Really?

Would come as a surprise to the World Data Centre for Aerosols which had Synthesis and INtegration of Global Aerosol Data Sets. Contract No. ENV4-CT98-0780 (DG 12 –EHKN) produced on data sets from 1999 to 2001. One of the specific issues they addressed were duplicate data sets.

More than a decade ago counts for a “classical integration scenario” I think.

Another quibble. Cited sources do not support the text.

New forms of data management such as dataspaces and pay-as-you-go data integration [2, 6] are a hot topic in database research. They are strongly related to Open Data Platforms in that they assume large sets of heterogeneous data sources lacking a global or mediated schemata, which still should be queried uniformly.

2 M. Franklin, A. Halevy, and D. Maier. From databases to dataspaces: a new abstraction for information management. SIGMOD Rec., 34:27{33, December 2005.

6 J. Madhavan, S. R. Je ery, S. Cohen, X. . Dong, D. Ko, C. Yu, A. Halevy, and G. Inc. Web-scale Data Integration: You Can Only A fford to Pay As You Go. In Proc. of CIDR-07, 2007.

Articles written seven (7) and five (5) years ago, do not justify a “hot topic(s) in database research.” claim today.

There are other issues, major and minor but for all that, this is important work.

I want to see reports that do justice to its importance.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress