Archive for the ‘HStreaming’ Category

Dempsy – a New Real-time Framework for Processing BigData

Friday, May 4th, 2012

Dempsy – a New Real-time Framework for Processing BigData by Boris Lublinsky.

From the post:

Real time processing of BigData seems to be one of the hottest topics today. Nokia has just released a new open-source project – Dempsy. Dempsy is comparable to Storm, Esper, Streambase, HStreaming and Apache S4. The code is released under the Apache 2 license

Dempsy is meant to solve the problem of processing large amounts of "near real time" stream data with the lowest lag possible; problems where latency is more important that "guaranteed delivery." This class of problems includes use cases such as:

  • Real time monitoring of large distributed systems
  • Processing complete rich streams of social networking data
  • Real time analytics on log information generated from widely distributed systems
  • Statistical analytics on real-time vehicle traffic information on a global basis

The important properties of Dempsy are:

  • It is Distributed. That is to say a Dempsy application can run on multiple JVMs on multiple physical machines.
  • It is Elastic. That is, it is relatively simple to scale an application to more (or fewer) nodes. This does not require code or configuration changes but done by dynamic insertion or removal of processing nodes.
  • It implements Message Processing. Dempsy is based on message passing. It moves messages between Message processors, which act on the messages to perform simple atomic operations such as enrichment, transformation, etc. In general, an application is intended to be broken down into more smaller simpler processors rather than fewer large complex processors.
  • It is a Framework. It is not an application container like a J2EE container, nor a simple library. Instead, like the Spring Framework, it is a collection of patterns, the libraries to enable those patterns, and the interfaces one must implement to use those libraries to implement the patterns.

Dempsy’ programming model is based on message processors communicating via messages and resembles a distributed actor framework . While not strictly speaking an actor framework in the sense of Erlang or Akka actors, where actors explicitely direct messages to other actors, Dempsy’s Message Processors are "actor like POJOs" similar to Processor Elements in S4 and to some extent Bolts in Storm. Message processors are similar to actors in that they operate on a single message at a time, and need not deal with concurrency directly. Unlike actors, Message Processors also are relieved of the the need to know the destination(s) for their output messages, as this is handled inside by Dempsy based on the message properties.

In short Dempsy is a framework to enable the decomposing of a large class of message processing problems into flows of messages between relatively simple processing units implemented as POJOs. 

The Dempsy Tutorial contains more information.

See the post for an interview with Dempsy’s creator, NAVTEQ Fellow Jim Carroll.

Will the “age of data” mean that applications and their code will also be viewed and processed as data? The capabilities you have are those you request for a particular data set? Would like to see topic maps on the leading (and not dragging) edge of that change.