GDELT: The Global Database of Events, Language, and Tone
From the about page:
The Global Database of Events, Language, and Tone (GDELT) is an initiative to construct a catalog of human societal-scale behavior and beliefs across all countries of the world over the last two centuries down to the city level globally, to make all of this data freely available for open research, and to provide daily updates to create the first “realtime social sciences earth observatory.” Nearly a quarter-billion georeferenced events capture global behavior in more than 300 categories covering 1979 to present with daily updates.
GDELT is designed to help support new theories and descriptive understandings of the behaviors and driving forces of global-scale social systems from the micro-level of the individual through the macro-level of the entire planet by offering realtime synthesis of global societal-scale behavior into a rich quantitative database allowing realtime monitoring and analytical exploration of those trends.
GDELT’s evolving ability to capture ethnic, religious, and other social and cultural group relationships will offer profoundly new insights into the interplay of those groups over time, offering a rich new platform for understanding patterns of social evolution, while the data’s realtime nature will expand current understanding of social systems beyond static snapshots towards theories that incorporate the nonlinear behavior and feedback effects that define human interaction and greatly enrich fragility indexes, early warning systems, and forecasting efforts.
GDELT’s goal is to help uncover previously-obscured spatial, temporal, and perceptual evolutionary trends through new forms of analysis of the vast textual repositories that capture global societal activity, from news and social media archives to knowledge repositories.
Key Features
- Covers all countries globally
- Covers a quarter-century: 1979 to present
- Daily updates every day, 365 days a year
- Based on cross-section of all major international, national, regional, local, and hyper-local news sources, both print and broadcast, from nearly every corner of the globe, in both English and vernacular
- 58 fields capture all available detail about event and actors
- Ten fields capture significant detail about each actor, including role and type
- All records georeferenced to the city or landmark as recorded in the article
- Sophisticated geographic pipeline disambiguates and affiliates geography with actors
- Separate geographic information for location of event and for both actors, including GNS and GNIS identifiers
- All records include ethnic and religious affiliation of both actors as provided in the text
- Even captures ambiguous events in conflict zones (“unidentified gunmen stormed the mosque and killed 20 civilians”)
- Specialized filtering and linguistic rewriting filters considerably enhance TABARI’s accuracy
- Wide array of media and emotion-based “importance” indicators for each event
- Nearly a quarter-billion event records
- 100% open, unclassified, and available for unlimited use and redistribution
The download page lists various data sets, including the GDELT Global Knowledge Graph and daily downloads of intake data.
If you are looking for data to challenge your graph, topic map or data mining skills, GDELT is the right spot.
[…] I previously wrote about GDELT it wasn’t available for querying with Google’s BigQuery. That should certainly improve […]
Pingback by Global Data of Events, Languages, and Tones « Another Word For It — May 29, 2014 @ 6:55 pm