CLAVIN [Geotagging – Some Proofing Required]


From the webpage:

CLAVIN (*Cartographic Location And Vicinity INdexer*) is an open source software package for document geotagging and geoparsing that employs context-based geographic entity resolution. It combines a variety of open source tools with natural language processing techniques to extract location names from unstructured text documents and resolve them against gazetteer records. Importantly, CLAVIN does not simply “look up” location names; rather, it uses intelligent heuristics in an attempt to identify precisely which “Springfield” (for example) was intended by the author, based on the context of the document. CLAVIN also employs fuzzy search to handle incorrectly-spelled location names, and it recognizes alternative names (e.g., “Ivory Coast” and “Côte d’Ivoire”) as referring to the same geographic entity. By enriching text documents with structured geo data, CLAVIN enables hierarchical geospatial search and advanced geospatial analytics on unstructured data.

See for an online demo, videos and other materials.

Your mileage may vary.

I used a quote from today’s New York Times (Rockets Hit Hezbollah Stronghold in Lebanon):

An ongoing battle in the Syrian town of Qusair on the Lebanese border has laid bare Hezbollah’s growing role in the Syrian conflict. The Iranian-backed militia and Syrian troops launched an offensive against the town last weekend. After dozens of Hezbollah fighters were killed in Qusair over the past week and buried in large funerals in Lebanon, Hezbollah could no longer play down its involvement.

Col. Abdul-Jabbar al-Aqidi, commander of the Syrian rebels’ Military Council in Aleppo, appeared in a video this week while apparently en route to Qusair, in which he threatened to strike in Beirut’s southern suburbs in retaliation for Hezbollah’s involvement in Syria.

“We used to say before, ‘We are coming Bashar.’ Now we say, ‘We are coming Bashar and we are coming Hassan Nasrallah,'” he said, in reference to Hezbollah’s leader.

“We will strike at your strongholds in Dahiyeh, God willing,” he said, using the Lebanese name for Hezbollah’s power center in southern Beirut. The video was still online on Youtube on Sunday.

Hezbollah lawmaker Ali Ammar said the incident targeted coexistence between the Lebanese and claimed the U.S. and Israel want to return Lebanon to the years of civil war. “They want to throw Lebanon backward into the traps of civil wars that we left behind,” he told reporters. “We will not go backward.”

The results from CLAVIN:

Locations Extracted and Resolved From Text

ID Name Lat, Lon Country Code #
272103 Lebanon 33.83333, 35.83333 LB 3
6951366 Lebanese 44.49123, 26.0877 RO 3
276781 Beirut 33.88894, 35.49442 LB 2
162037 Dahiyeh 38.19023, 57.00984 TM 1
6252001 U.S. 39.76, -98.5 US 1
103089 Qusair 25.91667, 40.45 SA 1
163843 Syria 35, 38 SY 1
163843 Syrian 35, 38 SY 1
294640 Israel 31.5, 34.75 IL 1
170062 Aleppo 36.25, 37.5 SY 1

(The highlight added to show incorrect resolutions.)


RO = Romania

SA = Saudia Arabia

TM = Turkmenistan

Plus “Qusair” appears twice in the quoted text.

For the ten locations mentioned a seventy (70%) percent accuracy rate.

Better than the average American but proofing is still an essential step in editorial workflow.

I first saw this in Pete Warden’s Five short links.

Comments are closed.