A massive database now translates news in 65 languages in real time by Derrick Harris.
From the post:
I have written quite a bit about GDELT (the Global Database of Events, Languages and Tone) over the past year, because I think it’s a great example of the type of ambitious project only made possible by the advent of cloud computing and big data systems. In a nutshell, it’s database of more than 250 million socioeconomic and geopolitical events and their metadata dating back to 1979, all stored (now) in Google’s cloud and available to analyze for free via Google BigQuery or custom-built applications.
On Thursday, version 2.0 of GDELT was unveiled, complete with a slew of new features — faster updates, sentiment analysis, images, a more-expansive knowledge graph and, most importantly, real-time translation across 65 different languages. That’s 98.4 percent of the non-English content GDELT monitors. Because you can’t really have a global database, or expect to get a full picture of what’s happening around the world, if you’re limited to English language sources or exceedingly long turnaround times for translated content.
…
The GDELT homepage reports:
We’ll be releasing a new “Getting Started With GDELT” user guide in the next few days to walk you through the incredibly vast array of new capabilities in GDELT 2.0,…
Awesome, simply awesome!
Bear in mind that the data presented here isn’t “cooked.” That is it hasn’t been trimmed and merged with your client’s internal knowledge of “…socioeconomic and geopolitical events…” and how it impacts their interests.
For example, labor strikes in a shipping port on one continent may delay ontime shipments from a manufacturer on another for delivery to still a third continent. The information that ties all those items together is held by your client, not any public source.
There is vast sea of client data, relationships and interests to be mapped to from a resource like GDELT and the 2.0 version is simply upping the possible rewards.
Just in case you are curious:
Terms of Use
What can I do with GDELT and how can I use it in my projects?
Using GDELT
The GDELT Project is an open platform for research and analysis of global society and thus all datasets released by the GDELT Project are available for unlimited and unrestricted use for any academic, commercial, or governmental use of any kind without fee.
Redistributing GDELT
You may redistribute, rehost, republish, and mirror any of the GDELT datasets in any form. However, any use or redistribution of the data must include a citation to the GDELT Project and a link to this website (http://gdeltproject.org/).
It is hard to imagine a data resource getting any better than this!
PS: By late Spring 2015, the backfiles to 1979 will be available in GDELT 2.0 format. Maybe it can get better. 😉
PPS: See the GDELT Blog for posts on using GDELT.