Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 6, 2012

Apache Camel at 5 [2.10 release]

Filed under: Apache Camel,Integration,Tweets — Patrick Durusau @ 4:54 pm

Apache Camel celebrates 5 years in development with 2.10 release by Chris Mayer.

Chris writes:

Off the back of celebrating its fifth birthday at CamelOne 2012, the Apache Camel team have put the finishing touches to their next release, Apache Camel 2.10, adding in an array of new components to the Apache enterprise application integration platform.

No less than 483 issues have been resolved this time round, but the real draw is the 18 components added to the package, including Websocket and Twitter, allowing for deeper cohesive messaging for users. With the Twitter component, based on the Twitter4J library, users may obtain direct, polling, or event-driven consumption of timelines, users, trends, and direct messages. An example of combining the two can be found here.

Other additions to the component catalogue include support for HBase, CDI, MongoDB, Apache Avro, DynamoDB on AWS, Google GSON and Guava. Java 7 support is much more thorough now, as is support for Spring 3.1.x and Netty. A full list of all resolved issues can be found here.

The Twitter Websocket example reminds me of something I have been meaning to write about Twitter, topic maps and public data streams.

But more on that next week.

May 16, 2012

Identifying And Weighting Integration Hypotheses On Open Data Platforms

Filed under: Crowd Sourcing,Data Integration,Integration,Open Data — Patrick Durusau @ 12:58 pm

Identifying And Weighting Integration Hypotheses On Open Data Platforms by Julian Eberius, Katrin Braunschweig, Maik Thiele, and Wolfgang Lehner.

Abstract:

Open data platforms such as data.gov or opendata.socrata.com provide a huge amount of valuable information. Their free-for-all nature, the lack of publishing standards and the multitude of domains and authors represented on these platforms lead to new integration and standardization problems. At the same time, crowd-based data integration techniques are emerging as new way of dealing with these problems. However, these methods still require input in form of specific questions or tasks that can be passed to the crowd. This paper discusses integration problems on Open Data Platforms, and proposes a method for identifying and ranking integration hypotheses in this context. We will evaluate our findings by conducting a comprehensive evaluation using on one of the largest Open Data platforms.

This is interesting work on Open Data platforms but it is marred by claims such as:

Open Data Platforms have some unique integration problems that do not appear in classical integration scenarios and which can only be identi ed using a global view on the level of datasets. These problems include partial- or duplicated datasets, partitioned datasets, versioned datasets and others, which will be described in detail in Section 4.

Really?

Would come as a surprise to the World Data Centre for Aerosols which had Synthesis and INtegration of Global Aerosol Data Sets. Contract No. ENV4-CT98-0780 (DG 12 –EHKN) produced on data sets from 1999 to 2001. One of the specific issues they addressed were duplicate data sets.

More than a decade ago counts for a “classical integration scenario” I think.

Another quibble. Cited sources do not support the text.

New forms of data management such as dataspaces and pay-as-you-go data integration [2, 6] are a hot topic in database research. They are strongly related to Open Data Platforms in that they assume large sets of heterogeneous data sources lacking a global or mediated schemata, which still should be queried uniformly.

2 M. Franklin, A. Halevy, and D. Maier. From databases to dataspaces: a new abstraction for information management. SIGMOD Rec., 34:27{33, December 2005.

6 J. Madhavan, S. R. Je ery, S. Cohen, X. . Dong, D. Ko, C. Yu, A. Halevy, and G. Inc. Web-scale Data Integration: You Can Only A fford to Pay As You Go. In Proc. of CIDR-07, 2007.

Articles written seven (7) and five (5) years ago, do not justify a “hot topic(s) in database research.” claim today.

There are other issues, major and minor but for all that, this is important work.

I want to see reports that do justice to its importance.

April 12, 2012

Is There A Dictionary In The House? (Savanna – Think Software)

Filed under: Integration,Intelligence,OWL,Semantic Web — Patrick Durusau @ 7:04 pm

Reading a white paper on an integration solution from Thetus Corporation (on its Savanna product line) when I encountered:

Savanna supports the core architectural premise that the integration of external services and components is an essential element of any enterprise platform by providing out-of-the-box integrations with many of the technologies and programs already in use in the DI2E framework. These investments include existing programs, such as: the Intelligence Community Data Layer (ICDL), OPTIC (force protection application), WATCHDOG (Terrorist Watchlist 2.0), SERENGETI (AFRICOM socio-cultural analysis), SCAN-R (EUCOM deep futures analysis); and, in the future: TAC (tripwire search and analysis), and HSCB-funded modeling capabilities, including Signature Analyst and others. To further make use of existing external services and components, the proposed solution includes integration points for commercial and opensource software, including: SOLR (indexing), Open Sextant (geotagging), Apache OpenNLP (entity extraction), R (statistical analysis), ESRI (geo-processing), OpenSGI GeoCache (geospatial data), i2 Analyst’s Notebook (charting and analysis) and a variety of structured and unstructured data repositories.

I have to plead ignorance of the “existing program” alphabet soup but I am familiar with several of the open source packages.

I am not sure what an “integration point” for an unknown future use of any of those packages would look like. Do you? Their output can be used by any program but that hardly qualifies the other program as having an “integration point.”

I am sensitive to the use of “integration” because to me it means there is some basis for integration. So a user having integrated data once, can re-use and possibly enhance the basis for integration of data with other data. (We call that “merging” in topic map land.)

Integration and even reuse is mentioned: “The Savanna architecture prevents creating a set of comparable reuse issues at the enterprise scale by providing a set of interconnected and flexible models that articulate how analysis assets are sourced and created and how they are used by the community.” (page 16)

But not in enough detail to really evaluate the basis for re-use of data, data structures, enrichment of the same, etc.

Looked around for an SDK or such but came up empty.

Point of amusement:

It’s official, we’re debuting our newest release of Savanna at DoDIIS (March 21, 2012) (Department of Defense Intelligence Information Systems Worldwide Conference (DoDIIS))

The next blog entry by date?

Happy Peaceful Birthday to the Peace Corps (March 1, 2012)

I would appreciate hearing from anyone with information or stories to tell about how Savanna works in practice.

In particular I am interested in whether two distinct Savanna installations can share information in a blind interchange? That should be the test of re-use of information by another installation.

Moreover, do I have to convert data between formats or can data structures themselves be entities with properties?

PS: I am not overly impressed with the use of OWL for modeling in Savanna. The experience with “big data” has shown that starting with data first leads to different, perhaps more useful models than the other way around.

Premature modeling with OWL will result in models that are “useful” in meeting the expectations of the creating analyst. That may not be the criteria of “usefulness” that is required.

December 24, 2011

Development Life Cycle and Tools for Data Exchange Specification

Filed under: Integration,XML,XML Schema — Patrick Durusau @ 4:42 pm

Development Life Cycle and Tools for Data Exchange Specification (2008) by KC Morris , Puja Goyal.

Abstract:

In enterprise integration, a data exchange specification is an architectural artifact that evolves along with the business. Developing and maintaining a coherent semantic model for data exchange is an important, yet non-trivial, task. A coherent semantic model of data exchange specifications supports reuse, promotes interoperability, and, consequently, reduces integration costs. Components of data exchange specifications must be consistent and valid in terms of agreed upon standards and guidelines. In this paper, we describe an activity model and NIST developed tools for the creation, test, and maintenance of a shared semantic model that is coherent and supports scalable, standards-based enterprise integration. The activity model frames our research and helps define tools to support the development of data exchange specification implemented using XML (Extensible Markup Language) Schema.

A paper that makes it clear that interoperability is not a trivial task. Could be helpful in convincing the ‘powers that be’ that projects on semantic integration or interoperability have to be properly resourced in order to have a useful result.

Manufacturing System Integration Division – MSID XML Testbed (NIST)

Filed under: Integration,XML,XML Schema — Patrick Durusau @ 4:42 pm

Manufacturing System Integration Division – MSID XML Testbed (NIST)

From the website:

NIST’s efforts to define methods and tools for developing XML Schemas to support systems integraton will help you effectively build and deploy XML Schemas amongst partners in integration projects. Through the Manufacturing Interoperability Program (MIP) XML Testbed, NIST provides guidance on how to build XML Schemas as well as a collection of tools that will help with the process allowing projects to more quickly and efficiently meet their goals.

The NIST XML Schema development and testing process is documented as the Model Development Life Cycle, which is an activity model for the creation, use, and maintenance of shared semantic models, and has been used to frame our research and development tools. We have worked with a number of industries on refining and automating the specification process and provide a wealth of information on how to use XML to address your integration needs.

On this site you will find a collection of tools and ideas to help you in developing high quality XML schemas. The tools available on this site are offered to the general public free of charge. They have been developed by the United States Government and as such are not subject to copyright or other restrictions.

If you are interested in seeing the tools extended or having some of your work included in the service please contact us.

The thought did occur to me that you could write an XML schema that governs the documentation of the subjects, their properties and merging conditions in your information systems. Perhaps even to the point of using XSLT to run against the resulting documentation to create SQL statements for the integration of information resources held in your database (or accessible therefrom).

« Newer Posts

Powered by WordPress