From the email announcement:
The LDIF – Linked Data Integration Framework can be used within Linked Data applications to translate heterogeneous data from the Web of Linked Data into a clean local target representation while keeping track of data provenance. LDIF provides an expressive mapping language for translating data from the various vocabularies that are used on the Web into a consistent, local target vocabulary. LDIF includes an identity resolution component which discovers URI aliases in the input data and replaces them with a single target URI based on user-provided matching heuristics. For provenance tracking, the LDIF framework employs the Named Graphs data model.
Compared to the previous release 0.2, the new LDIF release provides:
- data access modules for gathering data from the Web via file download, crawling and accessing SPARQL endpoints. Web data is cached locally for further processing.
- a scheduler for launching data import and integration jobs as well as for regularly updating the local cache with data from remote sources.
- a second use case that shows how LDIF is used to gather and integrate data from several music-related Web data sources.
More information about LDIF, concrete usage examples and performance details are available at http://www4.wiwiss.fu-berlin.de/bizer/ldif/
Over the next months, we plan to extend LDIF along the following lines:
- Implement a Hadoop Version of the Runtime Environment in order to be able to scale to really large amounts of input data. Processes and data will be distributed over a cluster of machines.
- Add a Data Quality Evaluation and Data Fusion Module which allows Web data to be filtered according to different data quality assessment policies and provides for fusing Web data according to different conflict resolution methods.
Uses SILK (SILK – Link Discovery Framework Version 2.5) identity resolution semantics.