Twitter Streaming with EventMachine and DynamoDB
From the post:
This week Amazon Web Services launched their latest database offering ‘DynamoDB’ – a highly-scalable NoSQL database service.
We’ve been using a couple of NoSQL database engines at work for a while now: Redis and MongoDB. Mongo allowed us to simplify many of our data models and represent more faithfully the underlying entities we were trying to represent in our applications and Redis is used for those projects where we need to make sure that a person only classifies an object once.1
Whether you’re using MongoDB or MySQL, scaling the performance and size of a database is non-trivial and is a skillset in itself. DynamoDB is a fully managed database service aiming to offer high-performance data storage and retrieval at any scale, regardless of request traffic or storage requirements. Unusually for Amazon Web Services, they’ve made a lot of noise about some of the underlying technologies behind DynamoDB, in particular they’ve utilised SSD hard drives for storage. I guess telling us this is designed to give us a hint at the performance characteristics we might expect from the service.
» A worked example
As with all AWS products there are a number of articles outlining how to get started with DynamoDB. This article is designed to provide an example use case where DynamoDB really shines – parsing a continual stream of data from the Twitter API. We’re going to use the Twitter streaming API to capture tweets and index them by user_id and creation time.
Wanted to include something a little different after all the graph database and modeling questions. 😉
I need to work on something like this to more effectively use Twitter as an information stream. Passing all mentions of graphs and related terms along for further processing, perhaps by a map between Twitter userIDs and known authors. Could be interesting.