Archive for the ‘Crosswalk’ Category

Building Attribute and Value Crosswalks… [Please Verify]

Friday, April 5th, 2013

Building Attribute and Value Crosswalks Using Esri’s Data Interoperability Extension by Nathan Lebel.

From the post:

The Esri Data Interoperability Extension gives GIS professionals the ability to build complex spatial extraction, transformation, and loading (ETL) tools. Traditionally the crosswalking of feature classes and attributes is done prior to setting up the migration tools and is used only as a guide. The drawback to this method is that it takes a considerable amount of time to build the crosswalks and then to build the ETL tools.

GISI’s article, “Building Attribute and Value Crosswalks in ESRI Data Interoperability Extension the Scalable/Dynamic Way” outlines the use of the SchemaMapper transformer within Data Interoperability Extension which can pull crosswalk information directly from properly formatted tables. For large projects this means you can store crosswalk information in a single repository and point each ETL tool to that repository without needing to manage multiple crosswalk documents. For projects that might change during the lifecycle of the project the use of SchemaMapper means that changes can be made to the repository without requiring any additional changes to the ETL tool. There are three examples used in this article which encompasses a majority of crosswalking tasks; feature class to feature class, attribute to attribute, and attribute value to attribute value crosswalking. All of the examples use CSV files to store the crosswalk information; however the transformer can pull directly from RDBMS tables as well which gives you the ability to build a user interface to create and update crosswalks which is recommended for large scale projects.

The full article can be accessed on GISI’s blog or as a PDF or Ebook in either EPUB or Kindle or format.

If you have time, please read the original article. Obtain it from the links listed in the final paragraph.

I need for you to verify my reading of the process described in that article.

As far as I can tell, the author never say “why” or on what basis the various mappings are being made.

I would be hard pressed to duplicate the mapping based on the information given about the original data sources.

Having an opaque mapping can be useful, as the article says but what if I stumble upon the mapping five years from now? Or two years? Or perhaps even six months from now?

Specifying the “why” of a mapping is something topic maps are uniquely qualified to do.

You can define merging rules that require the basis for mapping to be specified.

If that basis is absent, no merging occurs.

Factual’s Crosswalk API

Saturday, August 27th, 2011

Factual’s Crosswalk API by Matthew Hurst.

From the post:

Factual, which is mining the web for knowledge using a variety of web mining methods, has released an API in the local space which aims to expose, for a specific local entity (e.g. a restaurant) the places on the web that it is mentioned. For example, you might find for a restaurant its homepage, its listing on Yelp, its listing on UrbanSpoon, etc.

This mapping between entities and mentions is potentially a powerful utility. Given all these mentions, if some of the data changes (e.g. via a user update on a Yelp page) then the central knowledge base information for that entity can be updated.

When I looked, the crosswalk API was still limited to the US. Matthew uncovers the accuracy of mapping issues known all to well to topic mappers.

From the Factual site:

Factual Crosswalk does four things:

  1. Converts a Factual ID into 3rd party identifiers and URLs
  2. Converts a 3rd party URL into a Factual canonical record
  3. Converts a 3rd party namespace and ID into a Factual canonical record
  4. Provides a list of URLs where a given Factual entity is found on the Internet

Don’t know about you but I am unimpressed.

In part because of the flatland mapping approach to identification. If all I know is Identifier1 was mapped to Identifier2, that is better than a poke with a sharp stick for identification purposes, but only barely. How do I discover what entity you thought was represented by Identifier1 or Identifier2?

I suppose piling up identifiers is one approach but we can do better than that.

PS: I am adding Crosswalk as a category so I can cover traditional crosswalks as developed by librarians. I am interested in what implicit parts of crosswalks should become explicit in a topic map. Pointers and suggestions welcome. Or conversions of crosswalks into topic maps.