Archive for the ‘Spatial Index’ Category

Free Text and Spatial Search…

Friday, October 11th, 2013

Free Text and Spatial Search with Spatial4J and Lucene Spatial by Steven Citron-Pousty.

From the post:

Hey there, Shifters. One of my talks at FOSS4G 2013 covered Lucene Spatial. Todays post is going to follow up on my post about creating Lucene Indices by adding spatial capabilities to the index. In the end you will have a a full example on how create a fast and full featured full text spatial search on any documents you want to use.

How to add spatial to your Lucene index

In the last post I covered how to create a Lucene index so in this post I will just cover how to add spatial. The first thing you need to understand are the two pieces of how spatial is handled by Lucene. A lot of this work is done by Dave Smiley. He gave a great presentation on all this technology at Lucene/Solr Revolution 2013. If you really want to dig in deep, I suggest you watch his 1:15 h:m long video – my blog post is more the Too Long Didn’t Listen (TL;DL) version.

  • Spatial4J: This Java library provides geospatial shapes, distance calculations, and importing and exporting shapes. It is Apache Licensed so it can be used with other ASF projects. Lucene Spatial uses Spatial4J to create the spatial objects that get indexed along with the documents. It will also be used when calculating distances in a query or when we want to convert between distance units. Spatial4J is able to handle real-world on a sphere coordinates (what comes out of a GPS unit) and projected coordinates (any 2D map) for both shapes and distances.

Short aside: The oldest Java based spatial library is JTS and is used in many other Open Source Java geospatial projects. Spatial4J uses JTS under the hood if you want to work with Polygon shapes. Unfortunately, until recently it was LGPL and so could not be included in Lucene. JTS has announced it’s intention to go to a BSD type license which should allow Spatial4J and JTS to start working together for more Java Spatial goodness for all. One of the beauties of FOSS is the ability to see development discussions happen in the open.

  • Lucene Spatial After many different and custom iterations – there is now lucene spatial built right into Lucene as a standard library. It is new with the 4.x releases of Lucene. What Lucene spatial does is provide the indexing and search strategies for spatial4j shapes stored in a Lucene index. It has SpatialStrategy as the base class to define the signature that any spatial strategy must fulfill. You then use the same strategy for the index writing and reading.

Today I will show the code to use spatial4j with Lucene Spatial to add a spatially indexed field to your lucene index.

Pay special attention to the changes that made it possible for Spatial4J and JTS work together.

Cooperation between projects makes the resulting whole stronger.

Some office projects need to have that realization.

Towards a Scalable Dynamic Spatial Database System [Watching Watchers]

Tuesday, November 20th, 2012

Towards a Scalable Dynamic Spatial Database System by Joaquín Keller, Raluca Diaconu, Mathieu Valero.


With the rise of GPS-enabled smartphones and other similar mobile devices, massive amounts of location data are available. However, no scalable solutions for soft real-time spatial queries on large sets of moving objects have yet emerged. In this paper we explore and measure the limits of actual algorithms and implementations regarding different application scenarios. And finally we propose a novel distributed architecture to solve the scalability issues.

At least in this version, you will find two copies of the same paper, the second copy sans the footnotes. So read the first twenty (20) pages and ignore the second eighteen (18) pages.

I thought the limitation of location to two dimensions understandable, for the use cases given, but am less convinced that treating a third dimension as an extra attribute is always going to be suitable.

Still, the results here are impressive as compared to current solutions so an additional dimension can be a future improvement.

The use case that I see missing is an ad hoc network of users feeding geo-based information back to a collection point.

While the watchers are certainly watching us, technology may be on the cusp of answering the question: “Who watches the watchers?” (The answer may be us.)

I first saw this in a tweet by Stefano Bertolo.

Integrating Lucene with HBase

Wednesday, March 7th, 2012

Integrating Lucene with HBase by Boris Lublinsky and Mike Segel.

You have to get to the conclusion for the punch line:

The simple implementation, described in this paper fully supports all of the Lucene functionality as validated by many unit tests from both Lucene core and contrib modules. It can be used as a foundation of building a very scalable search implementation leveraging inherent scalability of HBase and its fully symmetric design, allowing for adding any number of processes serving HBase data. It also avoids the necessity to close an open Lucene Index reader to incorporate newly indexed data, which will be automatically available to user with possible delay controlled by the cache time to live parameter. In the next article we will show how to extend this implementation to incorporate geospatial search support.

Put why your article is important in the introduction as well.

The second article does better:

Implementing Lucene Spatial Support

In our previous article [1], we discussed how to integrate Lucene with HBase for improved scalability and availability. In this article I will show how to extend this Implementation with the spatial support.

Lucene spatial contribution package [2, 3, 4, 5] provides powerful support for spatial search, but is limited to finding the closest point. In reality spatial search often has significantly more requirements, for example, which points belong to a given shape (circle, bounding box, polygon), which shapes intersect with a given shape and so on. Solution, presented in this article allows solving all of the above problems.

Neo4j Spatial: Why Should You Care?

Saturday, October 1st, 2011

Neo4j Spatial: Why Should You Care? by Peter Neubauer at SamGIS 2011.

A very nice slide deck from Peter Neubauer on Neo4j Spatial! Great images!

Spatio Temporal data Integration and Retrieval

Thursday, September 1st, 2011

STIR 2012 : ICDE 2012 Workshop on Spatio Temporal data Integration and Retrieval


When Apr 1, 2012 – Apr 1, 2012
Where Washington DC, USA
Submission Deadline Oct 21, 2011

From the notice:

International Workshop on Spatio Temporal data Integration and Retrieval (STIR2012) in conjunction with ICDE 2012

April 1, 2012, Washington DC, USA

As the world?s population increases and it puts increasing demands on the planet?s limited resources due to shifting life-styles, we not only need to monitor how we consume resources but also optimize resource usage. Some examples of the planet?s limited resources are water, energy, land, food and air. Today, significant challenges exist for reducing usage of these resources, while maintaining quality of life. The challenges range from understanding regionally varied impacts of global environmental change, through tracking diffusion of avian flu and responding to natural disasters, to adapting business practice to dynamically changing resources, markets and geopolitical situations. For these and many other challenges reference to location – and time – is the glue that connects disparate data sources. Furthermore, most of the systems and solutions that will be built to solve the above challenges are going to be heavily depend on structured data (generated by sensors and sensor based applications) which will be streaming in real-time, come in large volumes and will have spatial and temporal aspects to them.

This workshop is focused on making the research in information integration and retrieval more relevant to the challenges in systems with significant spatial and temporal components.

Sounds like they are playing our song!

A Fun Application of Compact Data Structures to Indexing Geographic Data

Monday, November 22nd, 2010

A Fun Application of Compact Data Structures to Indexing Geographic Data Author(s): Nieves R. Brisaboa, Miguel R. Luaces, Gonzalo Navarro, Diego Seco Keywords: geographic data, MBR, range query, wavelet tree


The way memory hierarchy has evolved in recent decades has opened new challenges in the development of indexing structures in general and spatial access methods in particular. In this paper we propose an original approach to represent geographic data based on compact data structures used in other fields such as text or image compression. A wavelet tree-based structure allows us to represent minimum bounding rectangles solving geographic range queries in logarithmic time. A comparison with classical spatial indexes, such as the R-tree, shows that our structure can be considered as a fun, yet seriously competitive, alternative to these classical approaches.

I must confess that after reading this article more than once, I still puzzle over: “Our experiments, featuring GIS-like scenarios, show that our index is a relevant and funnier alternative to classical spatial indexes, such as the R-tree ….”

I admit to being drawn to esoteric and even odd solutions but I would not describe most of them as being “funnier” than an R-tree.

For all that, the article will be useful to anyone developing topic maps for use with spatial indexes and is a good introduction to wavelet trees.


  1. Create an annotated bibliography of spatial indexes. (date limit, last five (5) years)
  2. Create an annotated bibliography of spatial data resources. (date limit, last five (5) years)
  3. How would you use MBRs (Minimum Bounding Rectangles) for merging purposes in a topic map? (3-5 pages, no citations)

From Documents To Targets: Geographic References

Saturday, November 20th, 2010

Exploiting geographic references of documents in a geographical information retrieval system using an ontology-based index Author(s): Nieves R. Brisaboa, Miguel R. Luaces, Ángeles S. Places and Diego Seco Keywords: Geographic information retrieval, Spatial index, Textual index, Ontology, System architecture


Both Geographic Information Systems and Information Retrieval have been very active research fields in the last decades. Lately, a new research field called Geographic Information Retrieval has appeared from the intersection of these two fields. The main goal of this field is to define index structures and techniques to efficiently store and retrieve documents using both the text and the geographic references contained within the text. We present in this paper two contributions to this research field. First, we propose a new index structure that combines an inverted index and a spatial index based on an ontology of geographic space. This structure improves the query capabilities of other proposals. Then, we describe the architecture of a system for geographic information retrieval that defines a workflow for the extraction of the geographic references in documents. The architecture also uses the index structure that we propose to solve pure spatial and textual queries as well as hybrid queries that combine both a textual and a spatial component. Furthermore, query expansion can be performed on geographic references because the index structure is based in an ontology.

Obviously relevant to the Afghan War Diary materials.

The authors observe:

…concepts such as the hierarchical nature of geographic space and the topological relationships between the
geographic objects must be considered….

Interesting but topic maps would help with “What defensive or offensive assets I have in a geographic area?”

The TV-tree — an index structure for high-dimensional data (1994)

Saturday, September 18th, 2010

The TV-tree — an index structure for high-dimensional data (1994) Authors: King-ip Lin , H. V. Jagadish , Christos Faloutsos Keywords:Spatial Index, Similarity Retrieval, Query by Context, R*-Tree, High-Dimensionality Feature Spaces.


We propose a file structure to index high-dimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such `varying length’ feature vectors. Finally we report simulation results, comparing the proposed structure with the R -tree, which is one of the most successful methods for low-dimensionality spaces. The results illustrate the superiority of our method, with up to 80% savings in disk accesses.

The notion of “…utilizing additional features whenever the additional discriminatory power is absolutely necessary…” is an important one.

Compare to fixed simplistic discrimination and/or fixed complex, high-overhead, discrimination between subject representatives.

Either one represents a failure of imagination.