Archive for the ‘Geographic Information Retrieval’ Category

gvSIG

Saturday, March 30th, 2013

gvSIG

I encountered the gvSIG site while tracking down the latest release of i3Geo.

From its mission statement:

The gvSIG project was born in 2004 within a project that consisted in a full migration of the information technology systems of the Regional Ministry of Infrastructure and Transport of Valencia (Spain), henceforth CIT, to free software. Initially, It was born with some objectives according to CIT needs. These objectives were expanded rapidly because of two reasons principally: on the one hand, the nature of free software, which greatly enables the expansion of technology, knowledge, and lays down the bases on which to establish a community, and, on the other hand, a project vision embodied in some guidelines and a plan appropriate to implement it.

Some of the software projects you will find at gvSIG are:

gvSIG Desktop

gvSIG is a Geographic Information System (GIS), that is, a desktop application designed for capturing, storing, handling, analyzing and deploying any kind of referenced geographic information in order to solve complex management and planning problems. gvSIG is known for having a user-friendly interface, being able to access the most common formats, both vector and raster ones. It features a wide range of tools for working with geographic-like information (query tools, layout creation, geoprocessing, networks, etc.), which turns gvSIG into the ideal tool for users working in the land realm.

gvSIG Mobile

gvSIG Mobile is a Geographic Information System (GIS) aimed at mobile devices, ideal for projects that capture and update data in the field. It’s known for having a user-friendly interface, being able to access the most common formats and a wide range of GIS and GPS tools which are ideal for working with geographic information.

gvSIG Mobile aims at broadening gvSIG Desktop execution platforms to a range of mobile devices, in order to give an answer to the needings of a growing number of mobile solutions users, who wish to use a GIS on different types of devices.

So far, gvSIG Mobile is a Geographic Information System, as well as a Spatial Data Infrastructures client for mobile devices. Such a client is also the first one licensed under open source.

I3Geo

i3Geo is an application for the development of interactive web maps. It integrates several open source applications into a single development platform, mainly Mapserver and OpenLayers. Developed in PHP and Javascript, it has functionalities that allows the user to have better control over the map output, allowing to modify the legend of layers, to apply filters, to perform analysis, etc.

i3Geo is completely customizable and can be tailor to the different users using the interactive map. Furthermore, the spatial data is organized in a catalogue that offers online access services such as WMS, WFS, KML or the download of files.

i3Geo was developed by the Ministry of the Environment of Brazil and it is actually part of the Brazilian Public Software Portal.

gvSIG Educa

What is gvSIG Educa?

“If I can’t picture it, I can’t understand it (A. Einstein)”

gvSIG Educa is a customization of the gvSIG Desktop Open Source GIS, adapted as a tool for the education of issues that have a geographic component.

The aim of gvSIG Educa is to provide educators with a tool that helps students to analyse and understand space, and which can be adapted to different levels or education systems.

gvSIG Educa is not only useful for the teaching of geographic material, but can also be used for learning any subject that contains a spatial component such as history, economics, natural science, sociology…

gvSIG Educa facilitates learning by letting students interact with the information, by adding a spatial component to the study of the material, and by facilitating the assimilation of concepts through visual tools such as thematic maps.

gvSIG Educa provides analysis tools that help to understand spatial relationships.

Definitely a site to visit if you are interested in open source GIS software and/or projects.

i3Geo

Saturday, March 30th, 2013

i3Geo

From the homepage:

i3Geo is an application for the development of interactive web maps. It integrates several open source applications into a single development platform, mainly Mapserver and OpenLayers. Developed in PHP and Javascript, it has functionalities that allows the user to have better control over the map output, allowing to modify the legend of layers, to apply filters, to perform analysis, etc.

i3Geo is completely customizable and can be tailor to the different users using the interactive map. Furthermore, the spatial data is organized in a catalogue that offers online access services such as WMS, WFS, KML or the download of files.

i3Geo was developed by the Ministry of the Environment of Brazil and it is actually part of the Brazilian Public Software Portal.

I followed an announcement about i3Geo 4.7 being available when the line “…an application for the development of interactive web maps,” caught my eye.

Features include:

  • Basic display: fix zoom, zoom by rectangle, panning, etc.
  • Advanced display: locator by attribute, zoom to point, zoom by geographical area, zoom by selection, zoom to layer
  • Integrated display: Wikipedia, GoogleMaps, Panoramio and Confluence
  • Integration with the OpenLayers, GoogleMaps and GoogleEarth APIs
  • Loading of WMS, KML, GeoRSS, shapefile, GPX and CSV layers
  • Management of independent databases
  • Layer catalog management system
  • Management of layers in maps: Change of the layers order, opacity change, title change, filters, thematic classification, legend and symbology changing
  • Analysis tools: buffers, regular grids, points distribution analysis, layer intersection, centroid calculation, etc.
  • Digitalization: vector editing that allows to create new geometries or edit xisting data.
  • Superposition of existing data at the data of the Google Maps and GoogleEarth catalogs.

Unless you want to re-invent mapping software, this could be quite useful for location relevant topic map data.

I first saw this at New final version of i3Geo available: i3Geo 4.7.

User evaluation of automatically generated keywords and toponyms… [of semantic gaps]

Tuesday, January 22nd, 2013

User evaluation of automatically generated keywords and toponyms for geo-referenced images by Frank O. Ostermann, Martin Tomko, Ross Purves. (Ostermann, F. O., Tomko, M. and Purves, R. (2013), User evaluation of automatically generated keywords and toponyms for geo-referenced images. J. Am. Soc. Inf. Sci.. doi: 10.1002/asi.22738)

Abstract:

This article presents the results of a user evaluation of automatically generated concept keywords and place names (toponyms) for geo-referenced images. Automatically annotating images is becoming indispensable for effective information retrieval, since the number of geo-referenced images available online is growing, yet many images are insufficiently tagged or captioned to be efficiently searchable by standard information retrieval procedures. The Tripod project developed original methods for automatically annotating geo-referenced images by generating representations of the likely visible footprint of a geo-referenced image, and using this footprint to query spatial databases and web resources. These queries return raw lists of potential keywords and toponyms, which are subsequently filtered and ranked. This article reports on user experiments designed to evaluate the quality of the generated annotations. The experiments combined quantitative and qualitative approaches: To retrieve a large number of responses, participants rated the annotations in standardized online questionnaires that showed an image and its corresponding keywords. In addition, several focus groups provided rich qualitative information in open discussions. The results of the evaluation show that currently the annotation method performs better on rural images than on urban ones. Further, for each image at least one suitable keyword could be generated. The integration of heterogeneous data sources resulted in some images having a high level of noise in the form of obviously wrong or spurious keywords. The article discusses the evaluation itself and methods to improve the automatic generation of annotations.

An echo of Steve Newcomb’s semantic impedance appears at:

Despite many advances since Smeulders et al.’s (2002) classic paper that set out challenges in content-based image retrieval, the quality of both nonspecialist text-based and content-based image retrieval still appears to lag behind the quality of specialist text retrieval, and the semantic gap, identified by Smeulders et al. as a fundamental issue in content-based image retrieval, remains to be bridged. Smeulders defined the semantic gap as

the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation. (p. 1353)

In fact, text-based systems that attempt to index images based on text thought to be relevant to an image, for example, by using image captions, tags, or text found near an image in a document, suffer from an identical problem. Since text is being used as a proxy by an individual in annotating image content, those querying a system may or may not have similar worldviews or conceptualizations as the annotator. (emphasis added)

That last sentence could have come out of a topic map book.

Curious what you make of the author’s claim that spatial locations provide an “external context” that bridges the “semantic gap?”

If we all use the same map of spatial locations, are you surprised by the lack of a “semantic gap?”

Sitegeist:…

Friday, December 14th, 2012

Sitegeist: A mobile app that tells you about your data surroundings by Nathan Yau.

Nathan writes:

From businesses to demographics, there’s data for just about anywhere you are. Sitegeist, a mobile application by the Sunlight Foundation, puts the sources into perspective.

App is free and the Sunlight site lists the following data for a geographic location:

  • Age Distribution
  • Political Contributions
  • Average Rent
  • Popular Local Spots
  • Recommended Restaurants
  • How People Commute
  • Record Temperatures
  • Housing Units Over Time

If you have an iPhone or Android phone, can you report if other data is available?

I was thinking along the lines of:

  • # of drug arrests
  • # type of drug arrests
  • # of arrests for soliciting (graphed by day/time)
  • # location of bail bond agencies

More tourist type information. ;-)

How would you enhance this data flow with a topic map?

Towards a Scalable Dynamic Spatial Database System [Watching Watchers]

Tuesday, November 20th, 2012

Towards a Scalable Dynamic Spatial Database System by Joaquín Keller, Raluca Diaconu, Mathieu Valero.

Abstract:

With the rise of GPS-enabled smartphones and other similar mobile devices, massive amounts of location data are available. However, no scalable solutions for soft real-time spatial queries on large sets of moving objects have yet emerged. In this paper we explore and measure the limits of actual algorithms and implementations regarding different application scenarios. And finally we propose a novel distributed architecture to solve the scalability issues.

At least in this version, you will find two copies of the same paper, the second copy sans the footnotes. So read the first twenty (20) pages and ignore the second eighteen (18) pages.

I thought the limitation of location to two dimensions understandable, for the use cases given, but am less convinced that treating a third dimension as an extra attribute is always going to be suitable.

Still, the results here are impressive as compared to current solutions so an additional dimension can be a future improvement.

The use case that I see missing is an ad hoc network of users feeding geo-based information back to a collection point.

While the watchers are certainly watching us, technology may be on the cusp of answering the question: “Who watches the watchers?” (The answer may be us.)

I first saw this in a tweet by Stefano Bertolo.

Georeferencer: Crowdsourced Georeferencing for Map Library Collections

Monday, November 19th, 2012

Georeferencer: Crowdsourced Georeferencing for Map Library Collections by Christopher Fleet, Kimberly C. Kowal and Petr Přidal.

Abstract:

Georeferencing of historical maps offers a number of important advantages for libraries: improved retrieval and user interfaces, better understanding of maps, and comparison/overlay with other maps and spatial data. Until recently, georeferencing has involved various relatively time-consuming and costly processes using conventional geographic information system software, and has been infrequently employed by map libraries. The Georeferencer application is a collaborative online project allowing crowdsourced georeferencing of map images. It builds upon a number of related technologies that use existing zoomable images from library web servers. Following a brief review of other approaches and georeferencing software, we describe Georeferencer through its five separate implementations to date: the Moravian Library (Brno), the Nationaal Archief (The Hague), the National Library of Scotland (Edinburgh), the British Library (London), and the Institut Cartografic de Catalunya (Barcelona). The key success factors behind crowdsourcing georeferencing are presented. We then describe future developments and improvements to the Georeferencer technology.

If your institution has a map collection or if you are interested in maps at all, you need to read this article.

There is an introduction video if you prefer: http://www.klokantech.com/georeferencer/.

Either way, you will be deeply impressed by this project.

And wondering: Can the same lessons be applied to crowd source the creation of topic maps?

zip-code-data-hacking

Saturday, October 27th, 2012

zip-code-data-hacking by Neil Kodner.

From the readme file:

sourcing publicly available files, generate useful zip code-county data.

My goal is to be able to map zip codes to county FIPS codes, without paying. So far, I’m able to produce county fips codes for 41456 counties out of a list of 42523 zip codes.

I was able to find a zip code database from unitedstateszipcodes.org, each zip code had a county name but not a county FIPS code. I was able to find County FIPS codes on the census.gov site through some google hacking.

The data files are in the data directory – I’ll eventuall add code to make sure the latest data files are retrieved at runtime. I didn’t do this yet because I didn’t want to hammer the sites while I was quickly iterating – a local copy did just fine.

In case you are wondering why this mapping between zip codes to county FIPS codes is important:

Federal information processing standards codes (FIPS codes) are a standardized set of numeric or alphabetic codes issued by the National Institute of Standards and Technology (NIST) to ensure uniform identification of geographic entities through all federal government agencies. The entities covered include: states and statistically equivalent entities, counties and statistically equivalent entities, named populated and related location entities (such as, places and county subdivisions), and American Indian and Alaska Native areas. (From: Federal Information Processing Standard (FIPS)

To use zip code based data against federal agency data (FIPS), requires this mapping.

I suspect Neil would appreciate your assistance.

I first saw this at Pete Warden’s Five Short Links.

…[A] Common Operational Picture with Google Earth (webcast)

Thursday, October 11th, 2012

Joint Task Force – Homeland Defense Builds a Common Operational Picture with Google Earth

October 25, 2012 at 02:00 PM Eastern Daylight Time

The security for the Asia-Pacific Economic Collaboration summit in 2011 in Honolulu, Hawaii involved many federal, state & local agencies. The complex task of coordinating information sharing among agencies was the responsibility of Joint Task Force – Homeland Defense (JTF-HD). JTF-HD turned to Google Earth technology to build a visualization capability that enabled all agencies to share information easily & ensure a safe and secure summit.

What you will learn:

  • Best practices for sharing geospatial information among federal, state & local agencies
  • How to incorporate data from many sources into your own Google Earth globe
  • How do get accurate maps with limited bandwidth or no connection at all.

Speaker: Marie Kennedy, Joint Task Force – Homeland Defense

Sponsored by Google.

In addition to the techniques demonstrated, I suspect the main lesson will be leveraging information/services that already exist.

Or information integration if you prefer a simpler description.

Information can be integrated by conversion or mapping.

Which one you choose depends upon your requirements and the information.

Reusable information integration (RI2), where you leverage your own investment, well, that’s another topic altogether. ;-)

Ask: Are you spending money to be effective or spending money to maintain your budget relative to other departments?

If the former, consider topic maps. If the latter, carry on.

Research Data Australia down to Earth

Friday, June 22nd, 2012

Research Data Australia down to Earth

From the post:

Context: free cloud servers, a workshop and an idea

In this post I look at some work we’ve been doing at the University of Western Sydney eResearch group on visualizing metadata about research data, in a geographical context. The goal is to build a data discovery service; one interface we’re exploring is the ability to ‘fly’ around Google Earth looking for data, from Research Data Australia (RDA). For example, a researcher could follow a major river and see what data collections there are along its course that might be of (re-)use. True, you can search the RDA site by dragging a marker on a map but this experiment is a more immersive approach to exploring the same data.

The post is a quick update on a work in progress, with some not very original reflections on the use of cloud servers. I am putting it here on my own blog first, will do a human-readable summary over at UWS soon, any suggestions or questions welcome.

You can try this out if you have Google Earth by downloading a KML file. This is a demo service only – let us know how you go.

This work was inspired by a workshop on cloud computing: this week Andrew (Alf) Leahy and I attended a NeCTAR and Australian National Data Service (ANDS) one day event, along with several UWS staff. The unstoppable David Flanders from ANDS asked us to run a ‘dojo’, giving technically proficient researchers and eResearch collaborators a hand-on experience with the NeCTAR research cloud, where all Australian University researchers with access to the Australian Access Federation login system are entitled to run free cloud-hosted virtual servers. Free servers! Not to mention post-workshop beer[i]. So senseis Alf and and PT worked with a small group of ‘black belts’ in a workshop loosely focused on geo-spatial data. Our idea was “Visualizing the distribution of data collections in Research Data Australia using Google Earth”[ii]. We’d been working on a demo of how this might be done for a few days, which we more-or less got running on the train from the Blue Mountains in to Sydney Uni in the morning.

When you read about “exploring” the data, bear in mind the question of how to record that “exploration?” Explorers used to keep journals, ships logs, etc. to record their explorations.

How do you record (if you do), your explorations of data? How do you share them if you do?

Given the ease of recording our explorations, no more long hand with a quill pen, is it odd that we don’t record our intellectual explorations?

Or do we want others to see a result that makes us look more clever than we are?

Gisgraphy

Sunday, March 18th, 2012

Gisgraphy

From the website:

Gisgraphy is a free, open source framework that offers the possibility to do geolocalisation and geocoding via Java APIs or REST webservices. Because geocoding is nothing without data, it provides an easy to use importer that will automagically download and import the necessary (free) data to your local database (Geonames and OpenStreetMap : 42 million entries). You can also add your own data with the Web interface or the importer connectors provided. Gisgraphy is production ready, and has been designed to be scalable(load balanced), performant and used in other languages than just java : results can be output in XML, JSON, PHP, Python, Ruby, YAML, GeoRSS, and Atom. One of the most popular GPS tracking System (OpenGTS) also includes a Gisgraphy client…read more

Free webservices:

  • Geocoding
  • Street Search
  • Fulltext Search
  • Reverse geocoding / street search
  • Find nearby
  • Address parser

Services that you could use with smart phone apps or in creating topic map based collections of data that involve geographic spaces.

Integrating Lucene with HBase

Wednesday, March 7th, 2012

Integrating Lucene with HBase by Boris Lublinsky and Mike Segel.

You have to get to the conclusion for the punch line:

The simple implementation, described in this paper fully supports all of the Lucene functionality as validated by many unit tests from both Lucene core and contrib modules. It can be used as a foundation of building a very scalable search implementation leveraging inherent scalability of HBase and its fully symmetric design, allowing for adding any number of processes serving HBase data. It also avoids the necessity to close an open Lucene Index reader to incorporate newly indexed data, which will be automatically available to user with possible delay controlled by the cache time to live parameter. In the next article we will show how to extend this implementation to incorporate geospatial search support.

Put why your article is important in the introduction as well.

The second article does better:

Implementing Lucene Spatial Support

In our previous article [1], we discussed how to integrate Lucene with HBase for improved scalability and availability. In this article I will show how to extend this Implementation with the spatial support.

Lucene spatial contribution package [2, 3, 4, 5] provides powerful support for spatial search, but is limited to finding the closest point. In reality spatial search often has significantly more requirements, for example, which points belong to a given shape (circle, bounding box, polygon), which shapes intersect with a given shape and so on. Solution, presented in this article allows solving all of the above problems.

GeoMapApp

Saturday, February 11th, 2012

GeoMapApp

From the webpage:

GeoMapApp is an earth science exploration and visualization application that is continually being expanded as part of the Marine Geoscience Data System (MGDS) at the Lamont-Doherty Earth Observatory of Columbia University. The application provides direct access to the Global Multi-Resolution Topography (GMRT) compilation that hosts high resolution (~100 m node spacing) bathymetry from multibeam data for ocean areas and ASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer) and NED (National Elevation Dataset) topography datasets for the global land masses.

See YouTube: GeoMapApp (21 video tutorial)

More data for your merging pleasure. Not to mention a resource on how others prefer to understand/view their data.

Designing Google Maps

Wednesday, January 11th, 2012

Designing Google Maps by Nathan Yau.

From the post:

Google Maps is one of Google’s best applications, but the time, energy, and thought put into designing it often goes unnoticed because of how easy it is to use, for a variety of purposes. Willem Van Lancker, a user experience and visual designer for Google Maps, describes the process of building a map application — color scheme, icons, typography, and “Googley-ness” — that practically everyone can use, worldwide.

I don’t normally disagree with anything Nathan says, particularly about design but I have to depart from him on why we don’t notice the excellence of Google Maps.

I think we have become accustomed to its excellence and since we don’t look elsewhere (most of us), then we don’t notice that it isn’t commonplace.

In fact for most of us it is a universe with one inhabitant, Google Maps.

That takes a lot of very hard work and skill.

The question is do you have the chops to make your topic map of one or more infoverses the “only” inhabitant, by user choice?

All the software a geoscientist needs. For free!

Sunday, December 4th, 2011

All the software a geoscientist needs. For free! by John A. Stevenson.

It is quite an impressive list and what’s more, John has provided a script to install it on a Linux machine.

If you any mapping or geoscience type needs, you would do well to consider some of the software listed here.

A handy set of tools if you are working with geoscience types on topic map applications as well.

GeoIQ API Overview

Friday, November 25th, 2011

GeoIQ API Overview

From the webpage:

GeoIQ is the engine that powers the GeoCommons Community. GeoIQ includes a full Application Programming Interface (API) that allows developers to build unique and powerful domain specific applications. The API provides capability for uploading and download data, searching for data and maps, building, embedding, and theming maps or charts, as well as general user, group, and permissions management.

The GeoIQ API consists of a REST API and a JavaScript API. REST means that it uses simple URL’s and HTTP methods to perform all of the actions. For example, a dataset is a specific endpoint that a user can create, read, update or delete (CRUD).

Another resource for topic mappers who want to link information to “real” locations. ;-)

Leaflet & GeoCommons JSON

Thursday, November 24th, 2011

Leaflet & GeoCommons JSON by Tim Waters.

From the post:

Hi, in this quick tutorial we will have a look at a new JavaScript mapping library, Leaflet using it to help load JSON features from a GeoCommons dataset. We will add our Acetate tile layer to the map, and use the cool API feature filtering functionalities to get just the features we want from the server, show them on a Leaflet map, add popups to the features, style the features according to what the feature is, and add some further interactivity. This blog follows up from two posts on my personal blog, showing GeoCommons features with OpenLayers and with Polymaps.

We have all read about tweets being used to plot reports or locations from and about the various “occupy” movements. I suspect that effective civil unrest is going to require greater planning for the distribution of support and resources in particular locales. Conveniently, current authorities have created or allowed to be created, maps and other resources that can be used for such purposes. This is one of those resources.

I don’t know of any research on such algorithms but occupiers might want to search for clusters of dense and confusing paths in urban areas. Those proved effective at times in struggles in Medieval times for control of walled cities. Once the walls were breached, would-be occupiers were confronted with warrens of narrow and confusing paths. As opposed to broad, open pathways that would enable a concentration of forces.

Is there an algorithm for longest, densest path?

However discovered, annotating a cluster of dense and confusing paths with tactical information and location of resources would be a natural use of topic maps. Or what to anticipate in such areas, if one is on the “other” side.

ASTER Global Digital Elevation Model (ASTER GDEM)

Thursday, November 24th, 2011

ASTER Global Digital Elevation Model (ASTER GDEM)

From the webpage:

ASTER GDEM is an easy-to-use, highly accurate DEM covering all the land on earth, and available to all users regardless of size or location of their target areas.

Anyone can easily use the ASTER GDEM to display a bird’s-eye-view map or run a flight simulation, and this should realize visually sophisticated maps. By utilizing the ASTER GDEM as a platform, institutions specialized in disaster monitoring, hydrology, energy, environmental monitoring etc. can perform more advanced analysis.

In addition to the data, there is a GDEM viewer (freeware) at this site.

All that is missing is your topic map and you.

piecemeal geodata

Sunday, November 6th, 2011

piecemeal geodata

Michal Migurski on the difficulties of using OpenStreetMap data:

Two weeks ago, I attended the 5th annual OpenStreetMap conference in Denver, State of the Map. My second talk was called Piecemeal Geodata, and I hoped to communicate some of the pain (and opportunity) in dealing with OpenStreetMap data as a consumer of the information, downstream from the mappers but hoping to make maps or work with the dataset. Harry Wood took notes that suggested I didn’t entirely miss the mark, but after I was done Tom MacWright congratulated me on my “excellent stealth rage talk”. It wasn’t really supposed to be ragey as such, so here are some of my slides and notes along with some followup to the problems I talked about.

Topic maps are in use in a number of commercial and governmental venues but aren’t the sort of thing you hear about like Twitter or Blackberries (mostly about outages).

Anticipating more civil disturbances over the next several years, do topic maps have something to offer when coupled with a technology like Google Maps or OSM?

It is one thing to indicate your location using an app, but can you report movement of forces in a way that updates the maps of some colleagues? In a secure manner?

What features would a topic map need for such an environment?

high road, for better OSM cartography

Sunday, November 6th, 2011

high road, for better OSM cartography

From the post:

High Road is a framework for normalizing the rendering of highways from OSM data, a critical piece of every OSM-based road map we’ve ever designed at Stamen. Deciding exactly which kinds of roads appear at each zoom level can really be done just once, and ideally shouldn’t be part of a lengthy database query in your stylesheet. In Cascadenik and regular Mapnik’s XML-based layer definitions, long queries balloon the size of a style until it’s impossible to scan quickly. In Carto’s JSON-based layer definitions the multiline-formatting of a complex query is completely out of the question. Further, each system has its own preferred way of helping you handle road casings.

Useful rendering of geographic maps (and the data you attach to them) is likely to be useful in a number of topic map contexts.

PS: OSM = OpenStreetMap.

Factual Resolve

Friday, October 28th, 2011

Factual Resolve

Factual has a new API – Resolve:

From the post:

The Internet is awash with data. Where ten years ago developers had difficulty finding data to power applications, today’s difficulty lies in making sense of its abundance, identifying signal amidst the noise, and understanding its contextual relevance. To address these problems Factual is today launching Resolve — an entity resolution API that makes partial records complete, matches one entity against another, and assists in de-duping and normalizing datasets.

The idea behind Resolve is very straightforward: you tell us what you know about an entity, and we, in turn, tell you everything we know about it. Because data is so commonly fractured and heterogeneous, we accept fragments of an entity and return the matching entity in its entirety. Resolve allows you to do a number of things that will make your data engineering tasks easier:

  • enrich records by populating missing attributes, including category, lat/long, and address
  • de-dupe your own place database
  • convert multiple daily deal and coupon feeds into a single normalized, georeferenced feed
  • identify entities unequivocally by their attributes

For example: you may be integrating data from an app that provides only the name of a place and an imprecise location. Pass what you know to Factual Resolve via a GET request, with the attributes included as JSON-encoded key/value pairs:

I particularly like the line:

identify entities unequivocally by their attributes

I don’t know about the “unequivocally” part but the rest of it rings true. At least in my experience.

Towards georeferencing archival collections

Friday, October 21st, 2011

Towards georeferencing archival collections

From the post:

One of the most effective ways to associate objects in archival collections with related objects is with controlled access terms: personal, corporate, and family names; places; subjects. These associations are meaningless if chosen arbitrarily. With respect to machine processing, Thomas Jefferson and Jefferson, Thomas are not seen as the same individual when judging by the textual string alone. While EADitor has incorporated authorized headings from LCSH and local vocabulary (scraped from terms found in EAD files currently in the eXist database) almost since its inception, it has not until recently interacted with other controlled vocabulary services. Interacting with EAC-CPF and geographical services is high on the development priority list.

geonames.org

Over the last week, I have been working on incorporating geonames.org queries into the XForms application. Geonames provides stable URIs for more than 7.5 million place names internationally. XML representations of each place are accessible through various REST APIs. These XML datastreams also include the latitude and longitude, which will make it possible to georeference archival collections as a whole or individual items within collections (an item-level indexing strategy will be offered in EADitor as an alternative to traditional, collection-based indexing soon).

This looks very interesting.

Details:

EADitor project site (Google Code): http://code.google.com/p/eaditor/
Installation instructions (specific for Ubuntu but broadly applies to all Unix-based systems): http://code.google.com/p/eaditor/wiki/UbuntuInstallation
Google Group: http://groups.google.com/group/eaditor

First experiences with GeoCouch

Wednesday, October 19th, 2011

First experiences with GeoCouch by tbuchwaldt.

From the post:

To learn some new stuff about cool databases and geo-aware services we started fiddling with GeoCouch, a CouchDB extension. To have a real scenario we could work on, we designed a small project: A CouchDB database contains documents with descriptions of fastfood restaurants. We agreed on 3 types of restaurants: KFC, Mc Donalds & Burgerking. We gave them some additonal information, namely opening and closing times and a boolean called “supersize”.

It sounds to me like this sort of service, coupled with a topic map of campus locations/services, could prove to be very amusing during “rush” week when directions and locations are not well known.

Geological Survey Austria launches thesaurus project

Tuesday, October 18th, 2011

Geological Survey Austria launches thesaurus project by Helmut Nagy.

From the post:

Throughout the last year the Semantic Web Company team has supported the Geological Survey of Austria (GBA) in setting up their thesaurusA thesaurus is a book that lists words grouped together according to similarity of meaning, in contrast to a dictionary, which contains definitions and pronunciations. The largest thesaurus in the world is the Historical Thesaurus of the Oxford English Dictionary, which contains more than … project. It started with a workshop in summer 2010 where we discussed use cases for using semantic web technologies as means to fulfill the INSPIRE directive. Now in fall 2011 GBA published their first thesauri as Linked Data using PoolParty’s new Linked Data front-end.

The Thesaurus Project of the GBA aims to create controlled vocabularies for the semantic harmonization of map-based geodata. The content-related realization of this project is governed by the Thesaurus Editorial Team, which consists of domain experts from the Geological Survey of Austria. With the development of semantically and technically interoperable geo-data the Geological Survey of Austria implements its legal obligation defined by the EU-Directive 2007/2/EC INSPIRE and the national “Geodateninfrastrukturgesetz” (GeoDIG), respectively.

I wonder if their “controlled vocabularies” are going to map to the terminology used over the history of Europe, in maps, art, accounts, histories, and other recorded materials?

If not, I wonder if there would be any support to tie that history into current efforts or do they plan on simply cutting off the historical record and starting with their new thesaurus?

Indexed Nearest Neighbour Search in PostGIS

Thursday, September 29th, 2011

Indexed Nearest Neighbour Search in PostGIS

From the post:

An always popular question on the PostGIS users mailing list has been “how do I find the N nearest things to this point?”.

To date, the answer has generally been quite convoluted, since PostGIS supports bounding box index searches, and in order to get the N nearest things you need a box large enough to capture at least N things. Which means you need to know how big to make your search box, which is not possible in general.

PostgreSQL has the ability to return ordered information where an index exists, but the ability has been restricted to B-Tree indexes until recently. Thanks to one of our clients, we were able to directly fund PostgreSQL developers Oleg Bartunov and Teodor Sigaev in adding the ability to return sorted results from a GiST index. And since PostGIS indexes use GiST, that means that now we can also return sorted results from our indexes.

This feature (the PostGIS side of it) was funded by Vizzuality, and hopefully it comes in useful in their CartoDB work.

You will need PostgreSQL 9.1 and the PostGIS source code from the repository, but this is what a nearest neighbour search looks like:

PostgreSQL? Isn’t that SQL? :-)

Indexed nearest neighbour search is a question of results, not ideology.

Better targeting through technology.

GRASS: Geographic Resources Analysis Support System

Saturday, September 17th, 2011

GRASS: Geographic Resources Analysis Support System

The post about satellite imagery analysis for Syria made me curious about tools for use for automated analysis of satellite images.

From the webpage:

Commonly referred to as GRASS, this is free Geographic Information System (GIS) software used for geospatial data management and analysis, image processing, graphics/maps production, spatial modeling, and visualization. GRASS is currently used in academic and commercial settings around the world, as well as by many governmental agencies and environmental consulting companies. GRASS is an official project of the Open Source Geospatial Foundation.

You may also want to visit the Open Dragon project.

From the Open Dragon site:

Availability of good software for teaching Remote Sensing and GIS has always been a problem. Commercial software, no matter how good a discount is offered, remains expensive for a developing country, cannot be distributed to students, and may not be appropriate for education. Home-grown and university-sourced software lacks long-term support and the needed usability and robustness engineering.

The OpenDragon Project was established in the Department of Computer Engineering of KMUTT in December of 2004. The primary objective of this project is to develop, enhance, and maintain a high-quality, commercial-grade software package for remote sensing and GIS analysis that can be distributed free to educational organizations within Thailand. This package, OpenDragon, is based on the Version 5 of the commercial Dragon/ips® software developed and marketed by Goldin-Rudahl Systems, Inc.

As of 2010, Goldin-Rudahl Systems has agreed that the Open Dragon software, based on Dragon version 5, will be open source for non-commercial use. The software source code should be available on this server by early 2011.

And there is always the commercial side, if you have funding ArcGIS. The makers of ArcGIS, Esri support a several open source GIS projects.

The results of using these or other software packages can be tied to other information using topic maps.

Spatial Search Plugin (SSP) for Solr

Thursday, September 15th, 2011

Spatial Search Plugin (SSP) for Solr

From the webpage:

With the continuous efforts of adjusting search results to focused target audiences, there’s an increasing demand for incorporating geographical location information into the standard search functionality. Spatial Search Plugin (SSP) for Apache Solr is a free, standalone plug-in which enables Geo / Location Based Search, and is built on top of the open source projects Apache Solr and Apache Lucene. It’s main goals and characteristics are:

  • Provide a complete, consistent, robust and fast implementation of advanced geospatial algorithms
  • Act as a standalone pluggable extension to Solr
  • Written in 100% Java
  • Compatible with Apache Solr and Apache Lucene
  • Open source under the Apache2 license
  • Well documented and comes with support

Location plus information about the location is a topic mappish sort of thing.

LinkedGeoData Release 2

Monday, September 12th, 2011

LinkedGeoData Release 2

From the webpage:

The aim of the LinkedGeoData (LGD) project is to make the OpenStreetMap (OSM) datasets easily available as RDF. As such the main target audience is the Semantic Web community, however it may turn out to be useful to a much larger audience. Additionally, we are providing interlinking with DBpedia and GeoNames and integration of class labels from translatewiki and icons from the Brian Quinion Icon Collection.

The result is a rich, open, and integrated dataset which we hope to be useful for research and application development. The datasets can be publicly accessed via downloads, Linked Data, and SPARQL-endpoints. We have also launched an experimental “Live-SPARQL-endpoint” that is synchronized with the minutely updates from OSM whereas the changes to our store are republished as RDF.

More geographic data.

How Hard is the Local Search Problem?

Monday, September 5th, 2011

How Hard is the Local Search Problem? by Matthew Hurst.

The “local search” problem that Matthew is addressing is illustrated with Google’s mapping of local restaurants in Matthew’s neighborhood.

The post starts:

The local search problem has two key components: data curation (creating and maintaining a set of high quality statements about what the world looks like) and relevance (returning those statements in a manner that satisfies a user need. The first part of the problem is a key enabler to success, but how hard is it?

There are many problems which involve bringing together various data sources (which might be automatically or manually created) and synthesizing an improved set of statements intended to denote something about the real world. The way in which we judge the results of such a process is to take the final database, sample it, and test it against what the world looks like.

In the local search space, this might mean testing to see if the phone number in a local listing is indeed that associated with a business of the given name and at the given location.

But do we quantify this challenge? We might perform the above evaluation and find out that 98% of the phone numbers are correctly associated. Is that good? Expected? Poor?

After following Matthew through his discussion of the various factors in “local search,” what are your thoughts on Google’s success with “local search?”

Could you do better?

How? Be specific, a worked example would be even more convincing.

GeoCommons Enterprise Features – Free!

Wednesday, July 13th, 2011

GeoCommons Enterprise Features – Free!

From the email announcement:

  • Analytics: Easy-to-use, advanced spatial analytics that users and groups can utilize to answer mission-critical questions. Select among numerous analyses such as filtering, buffers, spatial aggregation and predictive analysis.
  • Private Data Support: Keep proprietary data private and unsearchable by others. Now you can upload proprietary data, analyze it with other data and create compelling maps, charts and graphs all within a secure interface.
  • Groups and Permissions: Allow others in your group or organization to access and collaborate with you. Enable permissions at various levels to limit or expand data sharing. See a step-by-step guide of how to create groups and make your data private here from @seangorman.

For groups and private data, see: Private Data and Groups for GeoCommons!!

GeoCommons has 70,000 datasets.

If you look around you might find something you like.

Topic mappers should ask themselves: Why does this work? (more on that anon)

Scaling Scala at Twitter by Marius Eriksen

Tuesday, July 12th, 2011

Scaling Scala at Twitter by Marius Eriksen

From the description:

Rockdove is the backend service that powers the geospatial features on Twitter.com and the Twitter API (“Twitter Places”). It provides a datastore for places and a geospatial search engine to find them. To throw out some buzzwords, it is:

  • a distributed system
  • realtime (immediately indexes updates and changes)
  • horizontally scalable
  • fault tolerant

Rockdove is written entirely in Scala and was developed by 2 engineers with no prior Scala experience (nor with Java or the JVM). We think the geospatial search engine provides an interesting case study as it presents a mix of algorithm problems and “classic” scaling and optimization issues. We will report on our experience using Scala, focusing especially on:

  • “functional” systems design
  • concurrency and parallelism
  • using a “research language” in practice
  • when, where and why we turned the “functional dial”
  • avoiding mutable state

Not to mention being a well done presentation!