Archive for the ‘Geographic Data’ Category

Server-side clustering of geo-points…

Sunday, August 4th, 2013

Server-side clustering of geo-points on a map using Elasticsearch by Gianluca Ortelli.

From the post:

Plotting markers on a map is easy using the tooling that is readily available. However, what if you want to add a large number of markers to a map when building a search interface? The problem is that things start to clutter and it’s hard to view the results. The solution is to group results together into one marker. You can do that on the client using client-side scripting, but as the number of results grows, this might not be the best option from a performance perspective.

This blog post describes how to do server-side clustering of those markers, combining them into one marker (preferably with a counter indicating the number of grouped results). It provides a solution to the “too many markers” problem with an Elasticsearch facet.

The Problem

The image below renders quite well the problem we were facing in a project:


The mass of markers is so dense that it replicates the shape of the Netherlands! These items represent monuments and other things of general interest in the Netherlands; for an application we developed for a customer we need to manage about 200,000 of them and they are especially concentrated in the cities, as you can see in this case in Amsterdam: The “draw everything” strategy doesn’t help much here.

Server-side clustering of geo-points will be useful for representing dense geo-points.

Such as an Interactive Surveillance Map.

Or if you were building a map of police and security force sightings over multiple days to build up a pattern database.

Visualizing Web Scale Geographic Data…

Wednesday, July 10th, 2013

Visualizing Web Scale Geographic Data in the Browser in Real Time: A Meta Tutorial by Sean Murphy.

From the post:

Visualizing geographic data is a task many of us face in our jobs as data scientists. Often, we must visualize vast amounts of data (tens of thousands to millions of data points) and we need to do so in the browser in real time to ensure the widest-possible audience for our efforts and we often want to do this leveraging free and/or open software.

Luckily for us, Google offered a series of fascinating talks at this year’s (2013) IO that show one particular way of solving this problem. Even better, Google discusses all aspects of this problem: from cleaning the data at scale using legacy C++ code to providing low latency yet web-scale data storage and, finally, to rendering efficiently in the browser. Not surprisingly, Google’s approach highly leverages **alot** of Google’s technology stack but we won’t hold that against them.


Sean sets the background for two presentations:

All the Ships in the World: Visualizing Data with Google Cloud and Maps (36 minutes)


Google Maps + HTML5 + Spatial Data Visualization: A Love Story (60 minutes) (source code:

Both are well worth your time.


Saturday, June 8th, 2013


From the webpage:

JQVMap is a jQuery plugin that renders Vector Maps. It uses resizable Scalable Vector Graphics (SVG) for modern browsers like Firefox, Safari, Chrome, Opera and Internet Explorer 9. Legacy support for older versions of Internet Explorer 6-8 is provided via VML.

Whatever your source of data, cellphone location data, user observation, etc., rendering it to a geographic display may be useful.

Are You Near Me?

Saturday, June 8th, 2013

Lucene 4.X is a great tool for analyzing cellphone location data (Did you really think only the NSA has it?).

Chilamakuru Vishnu gets us started with a code heavy post with the promise of:

My Next Blog Post will talk about how to implement advanced spatial queries like

geoInterseting – where one polygon intersects with another polygon/line.

geoWithIn – where one polygon lies completely within another polygon.

Or you could obtain geolocation data by other means.

I first saw this at DZone.

The Map Myth of Sandy Island

Saturday, May 11th, 2013

The Map Myth of Sandy Island by Rebecca Maxwell.

From the post:

Sandy Island has long appeared on maps dating back to the early twentieth century. This island was supposedly located in the Pacific Ocean northwest of Australia in the Coral Sea. It first appeared on an edition of a British admiralty map back in 1908 proving that Sandy Island had been discovered by the French in 1876. Even modern maps, like the General Bathymetic Chart of the Oceans (the British Oceanopgraphic Dat Centre issued an errata about Sandy Island) and Google Earth, show the presence of an island at its coordinates. Sandy Island is roughly the size of Manhattan; it is about three miles wide and fifteen miles long. However, there is only one problem. The island does not actually exist.

Back in October 2012, an Australian research ship undiscovered the island. The ship, called the Southern Surveyor, was led by Maria Seton, a scientist from the University of Sydney. The purpose of the twenty-five-day expedition was to gather information about tectonic activity, map the sea floor, and gather rock samples from the bottom. The scientific data that they had, including the General Bathymetic Chart of the Oceans, indicated the presence of Sandy Island halfway between Australia and the island of New Caledonia, a French possession. The crew began to get suspicious, however, when the chart from the ship’s master only showed open water. Plus, Google Earth only showed a dark blob where it should have been.

When the ship arrived at Sandy Island’s supposed coordinates, they found nothing but ocean a mile deep. One of the ship’s crewmembers, Steven Micklethwaite, said that they all had a good laugh at Google’s expense as they sailed through the island. The crew was quick to make their findings known. The story originally appeared in the Sydney Morning Herald and prompted a large amount of controversy. Cartographers were the most puzzled of all. Many wondered whether the island had ever existed or if it had been eroded away by the ocean waves over the years. Others wondered if the island mysteriously disappeared into the ocean like the legendary city of Atlantis. An “obituary” for Sandy Island, reporting the findings, was published in Eos, Transactions of the Geophysical Union in April of 2013.

Rebecca details the discovered/undiscovered history of Sandy Island in rich detail.

It’s a great story and you should treat yourself by reading it.

My only disagreement with Rebecca comes when she writes:

Maps are continually changing and modern maps still contain a human element that is vulnerable to mistakes.

On the contrary, maps, even modern ones, are wholly human constructs.

Not just the mistakes but the degree of accuracy, the implicit territorial or political claims, what is interesting enough to record, etc., are all human choices in production.

To say nothing of humans on the side of reading/interpretation as well.

If there were no sentient creatures to read it, would a map have any meaning?

Largest Coffee Table Book

Wednesday, May 8th, 2013

Largest Atlas in the World Created using ArcGIS by Caitlin Dempsey.

From the post:

Earth Platinum, the largest atlas ever printed, was released in February 2012 by Millennium House, Australia. Only 31 copies of the 330 pound, leather-bound book exist and each are priced at $100,000. The book measures 6ft by 9ft and has been recognized by Chris Sheedy of the Guinness Book of World Records as the largest atlas in existence. The book contains 128 pages and requires at least two hands, or in some case multiple people, to turn the pages.

Earth Platinum has surpassed the previous holder of the world record for largest atlas, the famous Klencke Atlas (which measures about 5′ 9″ by 6′ 3″ when opened). The Klencke Atlas is housed in the Antiquarian Mapping Division of the British Library in London and held the title for largest atlas worldwide from 1660 until the publication of Earth Platinum. Published as a one-off over 350 years ago, the Klencke Atlas is reported to contain all geographical knowledge of that time, just as Earth Platinum does today.

Amazon doesn’t have it listed so I can’t say if you get a discount and/or free shipping or both. 😉

Interesting but only as a publishing oddity.

I would rather have a digital version that is a geographic interface into a general knowledge topic map.

The OpenStreetMap Package Opens Up

Sunday, April 21st, 2013

The OpenStreetMap Package Opens Up

From the post:

A new version of the OpenStreetMap package is now up on CRAN, and should propagate to all the mirrors in the next few days. The primary purpose of the package is to provide high resolution map/satellite imagery for use in your R plots. The package supports base graphics and ggplot2, as well as transformations between spatial coordinate systems.

The bigest change in the new version is the addition of dozens of tile servers, giving the user the option of many different map looks, including those from Bing, MapQuest and Apple.

Very impressive display of the new capabilities in OpenStreetMap and this note about OpenStreetMap and ggmap:

Probably the main alternative to OpenStreetMap is the ggmap package. ggmap is an excellent package, and it is somewhat unfortunate that there is a significant duplication of effort between it and OpenStreetMap. That said, there are some differences that may help you decide which to use:

Reasons to favor OpenStreetMap:

  • More maps: OpenStreetMap supports more map types.
  • Better image resolution: ggmap only fetches one png from the server, and thus is limited to the resolution of that png, whereas OpenStreetMap can download many map tiles and stich them together to get an arbitrarily high image resolution.
  • Transformations: OpenStreetMap can be used with any map coordinate system, whereas ggmap is limited to long-lat.
  • Base graphics: Both packages support ggplot2, but OpenStreetMap also supports base graphics.
Reasons to favor ggmap:
  • No Java dependency: ggmap does not require Java to be installed.
  • Geocoding: ggmap has functions to do reverse geo coding.
  • Google maps: While OpenStreetMap has more map types, it currently does not support google maps.

Fair enough?

GeoLocation Friends Visualizer

Sunday, April 7th, 2013

GeoLocation Friends Visualizer by Marcel Caraciolo.

Slides from a presentation at the XXVI Pernambuco’s Python User Group meeting.

Code at:

Just to get you interested:

social network

If you had the phone records (cell and land) from elected and appointed government officials, you could begin to build a visualization of the government network.

In terms of an “effective” data leak, it is hard to imagine a better one.


Saturday, March 30th, 2013


I encountered the gvSIG site while tracking down the latest release of i3Geo.

From its mission statement:

The gvSIG project was born in 2004 within a project that consisted in a full migration of the information technology systems of the Regional Ministry of Infrastructure and Transport of Valencia (Spain), henceforth CIT, to free software. Initially, It was born with some objectives according to CIT needs. These objectives were expanded rapidly because of two reasons principally: on the one hand, the nature of free software, which greatly enables the expansion of technology, knowledge, and lays down the bases on which to establish a community, and, on the other hand, a project vision embodied in some guidelines and a plan appropriate to implement it.

Some of the software projects you will find at gvSIG are:

gvSIG Desktop

gvSIG is a Geographic Information System (GIS), that is, a desktop application designed for capturing, storing, handling, analyzing and deploying any kind of referenced geographic information in order to solve complex management and planning problems. gvSIG is known for having a user-friendly interface, being able to access the most common formats, both vector and raster ones. It features a wide range of tools for working with geographic-like information (query tools, layout creation, geoprocessing, networks, etc.), which turns gvSIG into the ideal tool for users working in the land realm.

gvSIG Mobile

gvSIG Mobile is a Geographic Information System (GIS) aimed at mobile devices, ideal for projects that capture and update data in the field. It’s known for having a user-friendly interface, being able to access the most common formats and a wide range of GIS and GPS tools which are ideal for working with geographic information.

gvSIG Mobile aims at broadening gvSIG Desktop execution platforms to a range of mobile devices, in order to give an answer to the needings of a growing number of mobile solutions users, who wish to use a GIS on different types of devices.

So far, gvSIG Mobile is a Geographic Information System, as well as a Spatial Data Infrastructures client for mobile devices. Such a client is also the first one licensed under open source.


i3Geo is an application for the development of interactive web maps. It integrates several open source applications into a single development platform, mainly Mapserver and OpenLayers. Developed in PHP and Javascript, it has functionalities that allows the user to have better control over the map output, allowing to modify the legend of layers, to apply filters, to perform analysis, etc.

i3Geo is completely customizable and can be tailor to the different users using the interactive map. Furthermore, the spatial data is organized in a catalogue that offers online access services such as WMS, WFS, KML or the download of files.

i3Geo was developed by the Ministry of the Environment of Brazil and it is actually part of the Brazilian Public Software Portal.

gvSIG Educa

What is gvSIG Educa?

“If I can’t picture it, I can’t understand it (A. Einstein)”

gvSIG Educa is a customization of the gvSIG Desktop Open Source GIS, adapted as a tool for the education of issues that have a geographic component.

The aim of gvSIG Educa is to provide educators with a tool that helps students to analyse and understand space, and which can be adapted to different levels or education systems.

gvSIG Educa is not only useful for the teaching of geographic material, but can also be used for learning any subject that contains a spatial component such as history, economics, natural science, sociology…

gvSIG Educa facilitates learning by letting students interact with the information, by adding a spatial component to the study of the material, and by facilitating the assimilation of concepts through visual tools such as thematic maps.

gvSIG Educa provides analysis tools that help to understand spatial relationships.

Definitely a site to visit if you are interested in open source GIS software and/or projects.


Saturday, March 30th, 2013


From the homepage:

i3Geo is an application for the development of interactive web maps. It integrates several open source applications into a single development platform, mainly Mapserver and OpenLayers. Developed in PHP and Javascript, it has functionalities that allows the user to have better control over the map output, allowing to modify the legend of layers, to apply filters, to perform analysis, etc.

i3Geo is completely customizable and can be tailor to the different users using the interactive map. Furthermore, the spatial data is organized in a catalogue that offers online access services such as WMS, WFS, KML or the download of files.

i3Geo was developed by the Ministry of the Environment of Brazil and it is actually part of the Brazilian Public Software Portal.

I followed an announcement about i3Geo 4.7 being available when the line “…an application for the development of interactive web maps,” caught my eye.

Features include:

  • Basic display: fix zoom, zoom by rectangle, panning, etc.
  • Advanced display: locator by attribute, zoom to point, zoom by geographical area, zoom by selection, zoom to layer
  • Integrated display: Wikipedia, GoogleMaps, Panoramio and Confluence
  • Integration with the OpenLayers, GoogleMaps and GoogleEarth APIs
  • Loading of WMS, KML, GeoRSS, shapefile, GPX and CSV layers
  • Management of independent databases
  • Layer catalog management system
  • Management of layers in maps: Change of the layers order, opacity change, title change, filters, thematic classification, legend and symbology changing
  • Analysis tools: buffers, regular grids, points distribution analysis, layer intersection, centroid calculation, etc.
  • Digitalization: vector editing that allows to create new geometries or edit xisting data.
  • Superposition of existing data at the data of the Google Maps and GoogleEarth catalogs.

Unless you want to re-invent mapping software, this could be quite useful for location relevant topic map data.

I first saw this at New final version of i3Geo available: i3Geo 4.7.

Pan-European open data…

Wednesday, March 13th, 2013

Pan-European open data available online from EuroGeographics

From the post:

Data compiled from national mapping supplied by 45 European countries and territories can now be downloaded for free at

From today (8 March 2013), the 1:1 million scale topographic dataset, EuroGlobalMap will be available free of charge for any use under a new open data licence. It is produced using authoritative geo-information provided by members of EuroGeographics, the Association for European Mapping, Cadastre and Land Registry Authorities.


“World leaders acknowledge the need for further mainstream sustainable development at all levels, integrating economic, social and environmental aspects and recognising their inter-linkages,” she said. [EuroGeographics’ President, Ingrid Vanden Berghe]

“Geo-information is key. It provides a vital link among otherwise unconnected information and enables the use of location as the basis for searching, cross-referencing, analysing and understanding Europe-wide data.”

Geographic location is a common binding point for information.

Interesting to think about geographic steganography. Right latitude but wrong longitude, or other variations.

eSpatial launches free edition of mapping software

Wednesday, March 13th, 2013

eSpatial launches free edition of mapping software

From the post:

eSpatial, leading provider of powerful mapping software today announced the launch of a free edition of their flagship mapping software, also called eSpatial.

eSpatial mapping software lets users convert spreadsheet data into map form, with just a few clicks. This visualization provides immediate insights into market trends and challenges.

The new free edition of eSpatial is available to anyone who signs up for an account at Once logged on, users can create maps from their existing data and then post them on websites as interactive maps.

Since it launched last year, eSpatial has made strong inroads into the sales mapping and territory mapping software market, especially in the United States.

Paid editions (including Basic, Pro and Team) of the application with greater functionality – including the ability to handle increased amounts of data, reporting and sharing options – start at $399 for an annual subscription.

Just starting playing with this but it could be radically cool!

For example, what if you mapped a particular congressional district and then mapped by zip codes the donations to their campaign?

I need to read the manual and find some data to import.

BTW, high marks for one of the easiest registrations I have ever encountered.

User evaluation of automatically generated keywords and toponyms… [of semantic gaps]

Tuesday, January 22nd, 2013

User evaluation of automatically generated keywords and toponyms for geo-referenced images by Frank O. Ostermann, Martin Tomko, Ross Purves. (Ostermann, F. O., Tomko, M. and Purves, R. (2013), User evaluation of automatically generated keywords and toponyms for geo-referenced images. J. Am. Soc. Inf. Sci.. doi: 10.1002/asi.22738)


This article presents the results of a user evaluation of automatically generated concept keywords and place names (toponyms) for geo-referenced images. Automatically annotating images is becoming indispensable for effective information retrieval, since the number of geo-referenced images available online is growing, yet many images are insufficiently tagged or captioned to be efficiently searchable by standard information retrieval procedures. The Tripod project developed original methods for automatically annotating geo-referenced images by generating representations of the likely visible footprint of a geo-referenced image, and using this footprint to query spatial databases and web resources. These queries return raw lists of potential keywords and toponyms, which are subsequently filtered and ranked. This article reports on user experiments designed to evaluate the quality of the generated annotations. The experiments combined quantitative and qualitative approaches: To retrieve a large number of responses, participants rated the annotations in standardized online questionnaires that showed an image and its corresponding keywords. In addition, several focus groups provided rich qualitative information in open discussions. The results of the evaluation show that currently the annotation method performs better on rural images than on urban ones. Further, for each image at least one suitable keyword could be generated. The integration of heterogeneous data sources resulted in some images having a high level of noise in the form of obviously wrong or spurious keywords. The article discusses the evaluation itself and methods to improve the automatic generation of annotations.

An echo of Steve Newcomb’s semantic impedance appears at:

Despite many advances since Smeulders et al.’s (2002) classic paper that set out challenges in content-based image retrieval, the quality of both nonspecialist text-based and content-based image retrieval still appears to lag behind the quality of specialist text retrieval, and the semantic gap, identified by Smeulders et al. as a fundamental issue in content-based image retrieval, remains to be bridged. Smeulders defined the semantic gap as

the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation. (p. 1353)

In fact, text-based systems that attempt to index images based on text thought to be relevant to an image, for example, by using image captions, tags, or text found near an image in a document, suffer from an identical problem. Since text is being used as a proxy by an individual in annotating image content, those querying a system may or may not have similar worldviews or conceptualizations as the annotator. (emphasis added)

That last sentence could have come out of a topic map book.

Curious what you make of the author’s claim that spatial locations provide an “external context” that bridges the “semantic gap?”

If we all use the same map of spatial locations, are you surprised by the lack of a “semantic gap?”

Promoting Topic Maps With Disasters?

Sunday, January 20th, 2013

When I saw the post headline:

Unrestricted access to the details of deadly eruptions

I immediately thought about the recent (ongoing?) rash of disaster movies. What they lack in variety they make up for in special effects.

The only unrealistic part is that largely governments respond effectively or at least attempt to, rather than making the rounds on the few Sunday morning interview programs. Well, it is fiction after all.

But the data set sounds like one that could be used to market topic maps as a “disaster” app.

Imagine a location based app that shows your proximity to the “kill” zone of a historic volcano.

Along with mapping to other vital data, such as the nearest movie star. 😉

Something to think about.

Volcanic eruptions have the potential to cause loss of life, disrupt air traffic, impact climate, and significantly alter the surrounding landscape. Knowledge of the past behaviours of volcanoes is key to producing risk assessments of the hazards of modern explosive events.

The open access database of Large Magnitude Explosive Eruptions (LaMEVE) will provide this crucial information to researchers, civil authorities and the general public alike.

Compiled by an international team headed by Dr Sian Crosweller from the Bristol’s School of Earth Sciences with support from the British Geological Survey, the LaMEVE database provides – for the first time – rapid, searchable access to the breadth of information available for large volcanic events of magnitude 4 or greater with a quantitative data quality score.

Dr Crosweller said: “Magnitude 4 or greater eruptions – such as Vesuvius in 79AD, Krakatoa in 1883 and Mount St Helens in 1980 – are typically responsible for the most loss of life in the historical period. The database’s restriction to eruptions of this size puts the emphasis on events whose low frequency and large hazard footprint mean preparation and response are often poor.”

Currently, data fields include: magnitude, Volcanic Explosivity Index (VEI), deposit volumes, eruption dates, and rock type; such parameters constituting the mainstay for description of eruptive activity.

Planned expansion of LaMEVE will include the principal volcanic hazards (such as pyroclastic flows, tephra fall, lahars, debris avalanches, ballistics), and vulnerability (for example, population figures, building type) – details of value to those involved in research and decisions relating to risk.

LaMEVE is the first component of the Volcanic Global Risk Identification and Analysis Project (VOGRIPA) database for volcanic hazards developed as part of the Global Volcano Model (GVM).

Principal Investigator and co-author, Professor Stephen Sparks of Bristol’s School of Earth Sciences said: “The long-term goal of this project is to have a global source of freely available information on volcanic hazards that can be used to develop protocols in the event of volcanic eruptions.

“Importantly, the scientific community are invited to actively participate with the database by sending new data and modifications to the database manager and, after being given clearance as a GVM user, entering data thereby maintaining the resource’s dynamism and relevance.”

LaMEVE is freely available online at

Maps in R: Plotting data points on a map

Tuesday, January 15th, 2013

Maps in R: Plotting data points on a map by Max Marchi.

From the post:

In the introductory post of this series I showed how to plot empty maps in R.

Today I’ll begin to show how to add data to R maps. The topic of this post is the visualization of data points on a map.

Max continues this series with datasets from airports in Europe and demonstrates how to map the airports to geographic locations. He also represents the airports with icons that correspond to their traffic statistics.

Useful principles for any data set with events that can be plotted against geographic locations.

Parades, patrols, convoys, that sort of thing.

App-lifying USGS Earth Science Data

Thursday, January 10th, 2013

App-lifying USGS Earth Science Data

Challenge Dates:

Submissions: January 9, 2013 at 9:00am EST – Ends April 1, 2013 at 11:00pm EDT.

Public Voting: April 5, 2013 at 5:00pm EDT – Ends April 25, 2013 at 11:00pm EDT.

Judging: April 5, 2013 at 5:00pm EDT – Ends April 25, 2013 at 11:00pm EDT.

Winners Announced: April 26, 2013 at 5:00pm EDT.

From the webpage:

USGS scientists are looking for your help in addressing some of today’s most perplexing scientific challenges, such as climate change and biodiversity loss. To do so requires a partnership between the best and the brightest in Government and the public to guide research and identify solutions.

The USGS is seeking help via this platform from many of the Nation’s premier application developers and data visualization specialists in developing new visualizations and applications for datasets.

USGS datasets for the contest consist of a range of earth science data types, including:

  • several million biological occurrence records (terrestrial and marine);
  • thousands of metadata records related to research studies, ecosystems, and species;
  • vegetation and land cover data for the United States, including detailed vegetation maps for the National Parks; and
  • authoritative taxonomic nomenclature for plants and animals of North America and the world.

Collectively, these datasets are key to a better understanding of many scientific challenges we face globally. Identifying new, innovative ways to represent, apply, and make these data available is a high priority.

Submissions will be judged on their relevance to today’s scientific challenges, innovative use of the datasets, and overall ease of use of the application. Prizes will be awarded to the best overall app, the best student app, and the people’s choice.

Of particular interest for the topic maps crowd:

Data used – The app must utilize a minimum of 1 DOI USGS Core Science and Analytics (CSAS) data source, though they need not include all data fields available in a particular resource. A list of CSAS databases and resources is available at: The use of data from other sources in conjunction with CSAS data is encouraged.

CSAS has a number of very interesting data sources. Classifications, thesauri, data integration, metadata and more.

Contest wins you a recognition and bragging rights, not to mention visibility for your approach.

CartoDB makes D3 maps a breeze

Sunday, January 6th, 2013

CartoDB makes D3 maps a breeze

From the post:

Anybody who loves maps and data can’t help but notice all the beautiful visualizations people are making with D3 right now. Huge thanks to Mike Bostock for such a cool technology.

We have done a lot of client-side rendering expirements over the past year or so and have to say, D3 is totally awesome. This is why we felt it might be helpful to show you how easy it is to use D3 with CartoDB. In the near future, we’ll be adding a few tutorials for D3 to our developer pages, but for now, let’s have a look.

Very impressive.

But populating a map with data isn’t the same as creating a useful map with data.

Take a look at the earthquake example.

What data would you add to it to make the information actionable?


Sunday, December 30th, 2012


From the webpage:

TopoJSON is an extension of GeoJSON that encodes topology. Rather than representing geometries discretely, geometries in TopoJSON files are stitched together from shared line segments called arcs. TopoJSON eliminates redundancy, offering much more compact representations of geometry than with GeoJSON; typical TopoJSON files are 80% smaller than their GeoJSON equivalents. In addition, TopoJSON facilitates applications that use topology, such as topology-preserving shape simplification, automatic map coloring, and cartograms.

I stumbled on this by viewing TopoJSON Points.

Displaying airports in the example but could be any geographic feature.

See the wiki for more details.


Friday, December 14th, 2012

Sitegeist: A mobile app that tells you about your data surroundings by Nathan Yau.

Nathan writes:

From businesses to demographics, there’s data for just about anywhere you are. Sitegeist, a mobile application by the Sunlight Foundation, puts the sources into perspective.

App is free and the Sunlight site lists the following data for a geographic location:

  • Age Distribution
  • Political Contributions
  • Average Rent
  • Popular Local Spots
  • Recommended Restaurants
  • How People Commute
  • Record Temperatures
  • Housing Units Over Time

If you have an iPhone or Android phone, can you report if other data is available?

I was thinking along the lines of:

  • # of drug arrests
  • # type of drug arrests
  • # of arrests for soliciting (graphed by day/time)
  • # location of bail bond agencies

More tourist type information. 😉

How would you enhance this data flow with a topic map?

Towards a Scalable Dynamic Spatial Database System [Watching Watchers]

Tuesday, November 20th, 2012

Towards a Scalable Dynamic Spatial Database System by Joaquín Keller, Raluca Diaconu, Mathieu Valero.


With the rise of GPS-enabled smartphones and other similar mobile devices, massive amounts of location data are available. However, no scalable solutions for soft real-time spatial queries on large sets of moving objects have yet emerged. In this paper we explore and measure the limits of actual algorithms and implementations regarding different application scenarios. And finally we propose a novel distributed architecture to solve the scalability issues.

At least in this version, you will find two copies of the same paper, the second copy sans the footnotes. So read the first twenty (20) pages and ignore the second eighteen (18) pages.

I thought the limitation of location to two dimensions understandable, for the use cases given, but am less convinced that treating a third dimension as an extra attribute is always going to be suitable.

Still, the results here are impressive as compared to current solutions so an additional dimension can be a future improvement.

The use case that I see missing is an ad hoc network of users feeding geo-based information back to a collection point.

While the watchers are certainly watching us, technology may be on the cusp of answering the question: “Who watches the watchers?” (The answer may be us.)

I first saw this in a tweet by Stefano Bertolo.

Georeferencer: Crowdsourced Georeferencing for Map Library Collections

Monday, November 19th, 2012

Georeferencer: Crowdsourced Georeferencing for Map Library Collections by Christopher Fleet, Kimberly C. Kowal and Petr Přidal.


Georeferencing of historical maps offers a number of important advantages for libraries: improved retrieval and user interfaces, better understanding of maps, and comparison/overlay with other maps and spatial data. Until recently, georeferencing has involved various relatively time-consuming and costly processes using conventional geographic information system software, and has been infrequently employed by map libraries. The Georeferencer application is a collaborative online project allowing crowdsourced georeferencing of map images. It builds upon a number of related technologies that use existing zoomable images from library web servers. Following a brief review of other approaches and georeferencing software, we describe Georeferencer through its five separate implementations to date: the Moravian Library (Brno), the Nationaal Archief (The Hague), the National Library of Scotland (Edinburgh), the British Library (London), and the Institut Cartografic de Catalunya (Barcelona). The key success factors behind crowdsourcing georeferencing are presented. We then describe future developments and improvements to the Georeferencer technology.

If your institution has a map collection or if you are interested in maps at all, you need to read this article.

There is an introduction video if you prefer:

Either way, you will be deeply impressed by this project.

And wondering: Can the same lessons be applied to crowd source the creation of topic maps?


Saturday, October 27th, 2012

zip-code-data-hacking by Neil Kodner.

From the readme file:

sourcing publicly available files, generate useful zip code-county data.

My goal is to be able to map zip codes to county FIPS codes, without paying. So far, I’m able to produce county fips codes for 41456 counties out of a list of 42523 zip codes.

I was able to find a zip code database from, each zip code had a county name but not a county FIPS code. I was able to find County FIPS codes on the site through some google hacking.

The data files are in the data directory – I’ll eventuall add code to make sure the latest data files are retrieved at runtime. I didn’t do this yet because I didn’t want to hammer the sites while I was quickly iterating – a local copy did just fine.

In case you are wondering why this mapping between zip codes to county FIPS codes is important:

Federal information processing standards codes (FIPS codes) are a standardized set of numeric or alphabetic codes issued by the National Institute of Standards and Technology (NIST) to ensure uniform identification of geographic entities through all federal government agencies. The entities covered include: states and statistically equivalent entities, counties and statistically equivalent entities, named populated and related location entities (such as, places and county subdivisions), and American Indian and Alaska Native areas. (From: Federal Information Processing Standard (FIPS)

To use zip code based data against federal agency data (FIPS), requires this mapping.

I suspect Neil would appreciate your assistance.

I first saw this at Pete Warden’s Five Short Links.

Sneak Peek into Skybox Imaging’s Cloudera-powered Satellite System [InaaS?]

Saturday, October 20th, 2012

Sneak Peek into Skybox Imaging’s Cloudera-powered Satellite System by Justin Kestelyn (@kestelyn)

This is a guest post by Oliver Guinan, VP Ground Software, at Skybox Imaging. Oliver is a 15-year veteran of the internet industry and is responsible for all ground system design, architecture and implementation at Skybox.

One of the great promises of the big data movement is using networks of ubiquitous sensors to deliver insights about the world around us. Skybox Imaging is attempting to do just that for millions of locations across our planet.

Skybox is developing a low cost imaging satellite system and web-accessible big data processing platform that will capture video or images of any location on Earth within a couple of days. The low cost nature of the satellite opens the possibility of deploying tens of satellites which, when integrated together, have the potential to image any spot on Earth within an hour.

Skybox satellites are designed to capture light in the harsh environment of outer space. Each satellite captures multiple images of a given spot on Earth. Once the images are transferred from the satellite to the ground, the data needs to be processed and combined to form a single image, similar to those seen within online mapping portals.

With any sensor network, capturing raw data is only the beginning of the story. We at Skybox are building a system to ingest and process the raw data, allowing data scientists and end users to ask arbitrary questions of the data, then publish the answers in an accessible way and at a scale that grows with the number of satellites in orbit. We selected Cloudera to support this deployment.

Now is the time to start planning topic map based products that can incorporate this type of data.

There are lots of folks who are “curious” about what is happening next door, in the next block, a few “klicks” away, across the border, etc.

Not all of them have the funds for private “keyhole” satellites and vacuum data feeds. But they may have money to pay you for efficient and effective collation of intelligence data.

Topic maps empowering “Intelligence as a Service (InaaS)”?

…[A] Common Operational Picture with Google Earth (webcast)

Thursday, October 11th, 2012

Joint Task Force – Homeland Defense Builds a Common Operational Picture with Google Earth

October 25, 2012 at 02:00 PM Eastern Daylight Time

The security for the Asia-Pacific Economic Collaboration summit in 2011 in Honolulu, Hawaii involved many federal, state & local agencies. The complex task of coordinating information sharing among agencies was the responsibility of Joint Task Force – Homeland Defense (JTF-HD). JTF-HD turned to Google Earth technology to build a visualization capability that enabled all agencies to share information easily & ensure a safe and secure summit.

What you will learn:

  • Best practices for sharing geospatial information among federal, state & local agencies
  • How to incorporate data from many sources into your own Google Earth globe
  • How do get accurate maps with limited bandwidth or no connection at all.

Speaker: Marie Kennedy, Joint Task Force – Homeland Defense

Sponsored by Google.

In addition to the techniques demonstrated, I suspect the main lesson will be leveraging information/services that already exist.

Or information integration if you prefer a simpler description.

Information can be integrated by conversion or mapping.

Which one you choose depends upon your requirements and the information.

Reusable information integration (RI2), where you leverage your own investment, well, that’s another topic altogether. 😉

Ask: Are you spending money to be effective or spending money to maintain your budget relative to other departments?

If the former, consider topic maps. If the latter, carry on.

Planetary Data System – Geosciences Node

Monday, October 1st, 2012

Sounds like SciFi, yes? SciFi? No!

After seeing Google add some sea bed material to Google Maps, I started to wonder about radar based maps of other places. Like the Moon.

I remember the excitement Ranger 7 images generated. And that in grainy newspaper reproductions.

With just a little searching, I came across PDS (Planetary Data Services) Geosciences Node (Washington University in St. Louis).

From the web page:

The Geosciences Node of NASA’s Planetary Data System (PDS) archives and distributes digital data related to the study of the surfaces and interiors of terrestrial planetary bodies. We work directly with NASA missions to help them generate well-documented, permanent data archives. We provide data to NASA-sponsored researchers along with expert assistance in using the data. All our archives are online and available to the public to download free of charge.

Which includes:

  • Mars
  • Venus
  • Mercury
  • Moon
  • Earth (test data for other planetary surfaces)
  • Asteroids
  • Gravity Models

Even after checking the FAQ, I can’t explain the ordering of these entries. Order from the Sun doesn’t work. Neither does order or distance from Earth. Nor alphabetical sort order. Suggestions?

In any event, enjoy the data set!

Introducing BOSS Geo – the next chapter for BOSS

Friday, September 28th, 2012

Introducing BOSS Geo – the next chapter for BOSS

From the post:

Today, the Yahoo! BOSS team is thrilled to announce BOSS Geo, new additions to our Search API that’s designed to help foster innovation in the search industry. BOSS Geo, comprised of two popular services – PlaceFinder and PlaceSpotter – now offers powerful, new geo services to BOSS developers.

Geo is increasingly important in today’s always-on, mobile world and adding features like these have been among the most requested we’ve received from our developers. With mobile devices becoming more pervasive, users everywhere want to be able to quickly pull up relevant geo information like maps or addresses. By adding PlaceFinder and PlaceSpotter to BOSS, we’re arming developers with rich new tools for driving more valuable and personalized interactions with their users.

PlaceFinder – Geocoding made simple

PlaceFinder is a geocoder (and reverse geocoder) service. The service helps developers convert an address into a latitude/longitude and alternatively, if you provide a latitude/longitude it can resolve it to an address. Whether you are building a check-in service or want to show an address on a map, we’ve got you covered. PlaceFinder already powers several popular applications like foursquare. which uses it to power check-ins on their mobile application. BOSS PlaceFinder offers tiered pricing and one simple monthly bill.

(graphics omitted)

PlaceSpotter – Adding location awareness to your content

The PlaceSpotter API (formerly known as PlaceMaker) allows developers to take a piece of content, pull rich information about the locations mentioned and provide meaning to those locations. A news article is no longer just text but has rich, meaningful geographical information associated with it. For instance, the next time your users are reading a review of a cool new coffee shop in the Mission neighborhood in San Francisco, they can discover another article about a hip new bakery in the same neighborhood. Learn more on the new PlaceSpotter service.

What information would you merge using address data as a link point?

Amsterdam (Netherlands) is included. Perhaps sexual preferences in multiple languages, keyed to your cell phone’s location? (Or does that exist already?)


We intend to shut down the current free versions of PlaceFinder and PlaceMaker on November 17, 2012.

Development using YQL tables will still be available.

Foundation grants $575,000 for new OpenStreetMap tools

Monday, September 24th, 2012

Foundation grants $575,000 for new OpenStreetMap tools

From the post:

The Knight Foundation has awarded a $575,000 grant to Washington-DC-based data visualisation and mapping firm Development Seed to work on new tools for OpenStreetMap (OSM). The Knight Foundation is a non-profit organisation dedicated to supporting quality journalism, media innovation and engaging communities. The award is one of six made by the Knight Foundation as part of Knight News Challenge: Data.

The funding will be used by developers from MapBox, part of Development Seed that designs maps using OSM data, to create three new open source tools for the OSM project to “lower the threshold for first time contributors”, while also making data “easier to consume by providing a bandwidth optimised data delivery system”.

Topic maps with geographic data are a sub-set of topic maps over all but its an important use case. And it is easy for people to relate to a “map” that looks like a “map.” Takes less mental effort. (One of those “slow” thinking things.) 😉

Looking forward to more good things to come from OpenStreetMaps!

“What Makes Paris Look Like Paris?”

Saturday, September 1st, 2012

“What Makes Paris Look Like Paris?” by Erwin Gianchandani.

From the post:

We all identify cities by certain attributes, such as building architecture, street signage, even the lamp posts and parking meters dotting the sidewalks. Now there’s a neat study by computer graphics researchers at Carnegie Mellon University — presented at SIGGRAPH 2012 earlier this month — that develops novel computational techniques to analyze imagery in Google Street View and identify what gives a city its character….

From the abstract:

Given a large repository of geotagged imagery, we seek to automatically find visual elements, e.g. windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously difficult task as the visual features distinguishing architectural elements of different places can be very subtle. In addition, we face a hard search problem: given all possible patches in all images, which of them are both frequently occurring and geographically informative? To address these issues, we propose to use a discriminative clustering approach able to take into account the weak geographic supervision. We show that geographically representative image elements can be discovered automatically from Google Street View imagery in a discriminative manner. We demonstrate that these elements are visually interpretable and perceptually geo-informative. The discovered visual elements can also support a variety of computational geography tasks, such as mapping architectural correspondences and influences within and across cities, finding representative elements at different geo-spatial scales, and geographically-informed image retrieval.

The video and other resources are worth the time to review/read.

What features do you rely on to “recognize” a city?

The potential to explore features within a city or between cities looks particularly promising.

Closing In On A Million Open Government Data Sets

Sunday, June 24th, 2012

Closing In On A Million Open Government Data Sets by Jennifer Zaino.

From the post:

A million data sets. That’s the number of government data sets out there on the web that we have closed in on.

“The question is, when you have that many, how do you search for them, find them, coordinate activity between governments, bring in NGOs,” says James A. Hendler, Tetherless World Senior Constellation Professor, Department of Computer Science and Cognitive Science Department at Rensselaer Polytechnic Institute, and a principal investigator of its Linking Open Government Data project lives, as well as Internet web expert for, He also is connected with many other governments’ open data projects. “Semantic web tools organize and link the metadata about these things, making them searchable, explorable and extensible.”

To be more specific, Hendler at SemTech a couple of weeks ago said there are 851,000 open government data sets across 153 catalogues from 30-something countries, with the three biggest representatives, in terms of numbers, at the moment being the U.S., the U.K, and France. Last week, the one million threshold was crossed.

About 410,000 of these data sets are from the U.S. (federal, state, city, county, tribal included), including quite a large number of geo-data sets. The U.S. government’s goal is to put “lots and lots and lots of stuff out there” and let people figure out what they want to do with it, he notes.

My question about data that “..[is] searchable, explorable and extensible,” is whether anyone wants to search, explore or extend it?

Simply piling up data to say you have a large pile of data doesn’t sound very useful.

I would rather have a smaller pile of data that included contract/testing transparency on anti-terrorism IT projects, for example. If the systems aren’t working, then disclosing them isn’t going to make them work any less well.

Not that anyone need fear transparency or failure to perform. The TSA has failed to perform for more than a decade now, failed to catch a single terrorist and it remains funded. Even when it starts groping children, passengers are so frightened that even that outrage passes without serious opposition.

Still, it would be easier to get people excited about mining government data if the data weren’t so random or marginal.

Research Data Australia down to Earth

Friday, June 22nd, 2012

Research Data Australia down to Earth

From the post:

Context: free cloud servers, a workshop and an idea

In this post I look at some work we’ve been doing at the University of Western Sydney eResearch group on visualizing metadata about research data, in a geographical context. The goal is to build a data discovery service; one interface we’re exploring is the ability to ‘fly’ around Google Earth looking for data, from Research Data Australia (RDA). For example, a researcher could follow a major river and see what data collections there are along its course that might be of (re-)use. True, you can search the RDA site by dragging a marker on a map but this experiment is a more immersive approach to exploring the same data.

The post is a quick update on a work in progress, with some not very original reflections on the use of cloud servers. I am putting it here on my own blog first, will do a human-readable summary over at UWS soon, any suggestions or questions welcome.

You can try this out if you have Google Earth by downloading a KML file. This is a demo service only – let us know how you go.

This work was inspired by a workshop on cloud computing: this week Andrew (Alf) Leahy and I attended a NeCTAR and Australian National Data Service (ANDS) one day event, along with several UWS staff. The unstoppable David Flanders from ANDS asked us to run a ‘dojo’, giving technically proficient researchers and eResearch collaborators a hand-on experience with the NeCTAR research cloud, where all Australian University researchers with access to the Australian Access Federation login system are entitled to run free cloud-hosted virtual servers. Free servers! Not to mention post-workshop beer[i]. So senseis Alf and and PT worked with a small group of ‘black belts’ in a workshop loosely focused on geo-spatial data. Our idea was “Visualizing the distribution of data collections in Research Data Australia using Google Earth”[ii]. We’d been working on a demo of how this might be done for a few days, which we more-or less got running on the train from the Blue Mountains in to Sydney Uni in the morning.

When you read about “exploring” the data, bear in mind the question of how to record that “exploration?” Explorers used to keep journals, ships logs, etc. to record their explorations.

How do you record (if you do), your explorations of data? How do you share them if you do?

Given the ease of recording our explorations, no more long hand with a quill pen, is it odd that we don’t record our intellectual explorations?

Or do we want others to see a result that makes us look more clever than we are?