Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 3, 2015

KillrWeather

Filed under: Akka,Cassandra,Spark,Time Series — Patrick Durusau @ 2:31 pm

KillrWeather

From the post:

KillrWeather is a reference application (which we are constantly improving) showing how to easily leverage and integrate Apache Spark, Apache Cassandra, and Apache Kafka for fast, streaming computations in asynchronous Akka event-driven environments. This application focuses on the use case of time series data.

The site doesn’t give enough emphasis to the importance of time series data. Yes, weather is an easy example of time series data, but consider another incomplete listing of the uses of time series data:

A time series is a sequence of data points, typically consisting of successive measurements made over a time interval. Examples of time series are ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average. Time series are very frequently plotted via line charts. Time series are used in statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, communications engineering, and largely in any domain of applied science and engineering which involves temporal measurements.

(Time Series)

Mastering KillrWeather will put you on the road to many other uses of time series data.

Enjoy!

I first saw this in a tweet by Chandra Gundlapalli.

January 22, 2015

Streaming Big Data with Spark, Spark Streaming, Kafka, Cassandra and Akka

Filed under: Akka,Cassandra,Kafka,Spark,Streams — Patrick Durusau @ 3:47 pm

Webinar: Streaming Big Data with Spark, Spark Streaming, Kafka, Cassandra and Akka by Helena Edelson.

From the post:

On Tuesday, January 13 I gave a webinar on Apache Spark, Spark Streaming and Cassandra. Over 1700 registrants from around the world signed up. This is a follow-up post to that webinar, answering everyone’s questions. In the talk I introduced Spark, Spark Streaming and Cassandra with Kafka and Akka and discussed wh​​​​y these particular technologies are a great fit for lambda architecture due to some key features and strategies they all have in common, and their elegant integration together. We walked through an introduction to implementing each, then showed how to integrate them into one clean streaming data platform for real-time delivery of meaning at high velocity. All this in a highly distributed, asynchronous, parallel, fault-tolerant system.

Video | Slides | Code | Diagram

About The Presenter: Helena Edelson is a committer on several open source projects including the Spark Cassandra Connector, Akka and previously Spring Integration and Spring AMQP. She is a Senior Software Engineer on the Analytics team at DataStax, a Scala and Big Data conference speaker, and has presented at various Scala, Spark and Machine Learning Meetups.

I have long contended that it is possible to have a webinar that has little if any marketing fluff and maximum technical content. Helena’s presentation is an example of that type of webinar.

Very much worth the time to watch.

BTW, being so content full, questions were answered as part of this blog post. Technical webinars just don’t get any better organized than this one.

Perhaps technical webinars should be marked with TW and others with CW (for c-suite webinars). To prevent disorientation in the first case and disappointment in the second one.

December 24, 2014

Akka – Streams and HTTP

Filed under: Actor-Based,Akka,Streams — Patrick Durusau @ 7:29 pm

New documentation for Akka Streams and HTTP.

From Akka Streams:

The way we consume services from the internet today includes many instances of streaming data, both downloading from a service as well as uploading to it or peer-to-peer data transfers. Regarding data as a stream of elements instead of in its entirety is very useful because it matches the way computers send and receive them (for example via TCP), but it is often also a necessity because data sets frequently become too large to be handled as a whole. We spread computations or analyses over large clusters and call it “big data”, where the whole principle of processing them is by feeding those data sequentially—as a stream—through some CPUs.

Actors can be seen as dealing with streams as well: they send and receive series of messages in order to transfer knowledge (or data) from one place to another. We have found it tedious and error-prone to implement all the proper measures in order to achieve stable streaming between actors, since in addition to sending and receiving we also need to take care to not overflow any buffers or mailboxes in the process. Another pitfall is that Actor messages can be lost and must be retransmitted in that case lest the stream have holes on the receiving side. When dealing with streams of elements of a fixed given type, Actors also do not currently offer good static guarantees that no wiring errors are made: type-safety could be improved in this case.

For these reasons we decided to bundle up a solution to these problems as an Akka Streams API. The purpose is to offer an intuitive and safe way to formulate stream processing setups such that we can then execute them efficiently and with bounded resource usage—no more OutOfMemoryErrors. In order to achieve this our streams need to be able to limit the buffering that they employ, they need to be able to slow down producers if the consumers cannot keep up. This feature is called back-pressure and is at the core of the Reactive Streams initiative of which Akka is a founding member. For you this means that the hard problem of propagating and reacting to back-pressure has been incorporated in the design of Akka Streams already, so you have one less thing to worry about; it also means that Akka Streams interoperate seamlessly with all other Reactive Streams implementations (where Reactive Streams interfaces define the interoperability SPI while implementations like Akka Streams offer a nice user API).

From HTTP:

The purpose of the Akka HTTP layer is to expose Actors to the web via HTTP and to enable them to consume HTTP services as a client. It is not an HTTP framework, it is an Actor-based toolkit for interacting with web services and clients….

Just in case you tire of playing with your NVIDIA GRID K2 board and want to learn more about Akka streams and HTTP. 😉

December 14, 2014

GearPump

Filed under: Actor-Based,Akka,Hadoop YARN,Samza,Spark,Storm,Tez — Patrick Durusau @ 7:30 pm

GearPump (GitHub)

From the wiki homepage:

GearPump is a lightweight, real-time, big data streaming engine. It is inspired by recent advances in the Akka framework and a desire to improve on existing streaming frameworks. GearPump draws from a number of existing frameworks including MillWheel, Apache Storm, Spark Streaming, Apache Samza, Apache Tez, and Hadoop YARN while leveraging Akka actors throughout its architecture.

What originally caught my attention was this passage on the GitHub page:

Per initial benchmarks we are able to process 11 million messages/second (100 bytes per message) with a 17ms latency on a 4-node cluster.

Think about that for a second.

Per initial benchmarks we are able to process 11 million messages/second (100 bytes per message) with a 17ms latency on a 4-node cluster.

The GitHub page features a word count example and pointers to the wiki with more examples.

What if every topic “knew” the index value of every topic that should merge with it on display to a user?

When added to a topic map it broadcasts its merging property values and any topic with those values responds by transmitting its index value.

When you retrieve a topic, it has all the IDs necessary to create a merged view of the topic on the fly and on the client side.

There would be redundancy in the map but de-duplication for storage space went out with preferences for 7-bit character values to save memory space. So long as every topic returns the same result, who cares?

Well, it might make a difference when the CIA want to give every contractor full access to its datastores 24×7 via their cellphones. But, until that is an actual requirement, I would not worry about the storage space overmuch.

I first saw this in a tweet from Suneel Marthi.

November 4, 2013

Akka at Conspire

Filed under: Akka,Scala — Patrick Durusau @ 10:25 pm

Akka at Conspire

From the post:

Ryan Tanner has posted a really good series of blogs on how and why they are using Akka, and especially how to design your application to make good use of clustering and routers. Akka provides solid tools but you still need to think where to point that shiny hammer, and Ryan has a solid story to tell:

  1. How We Built Our Backend on Akka and Scala
  2. Why We Like Actors
  3. Making Your Akka Life Easier
  4. Don’t Fall Into Our Anti-Pattern Traps
  5. The Importance of Pulling

PS: And no, we don’t mind anyone using our code, not even if it was contributed by Derek Wyatt (honorary team member) 🙂

Unlike the Peyton Place IT tragedies in Washington, this is a software tale that ends well.

Enjoy!

November 2, 2013

Principles of Reactive Programming [4th of November]

Filed under: Akka,Functional Programming,Scala — Patrick Durusau @ 4:08 pm

Principles of Reactive Programming [4th of November]

Just in case U.S. government intercepts either prevented you from getting the news or erased data from your calendar, just a reminder that Principles of Reactive Programming starts next Monday and runs for seven (7) weeks.

Even though I am signed up for another course, I am tempted to add this one. Unlikely as two courses is a bit much at one time.

But will be watching the lectures later to prepare for the next time.

June 24, 2013

Building Distributed Systems With The Right Tools:…

Filed under: Akka,Scala,Topic Map Software — Patrick Durusau @ 2:57 pm

Building Distributed Systems With The Right Tools: Akka Looks Promising

From the post:

Modern day developers are building complex applications that span multiple machines. As a result, availability, scalability, and fault tolerance are important considerations that must be addressed if we are to successfully meet the needs of the business.

As developers building distributed systems, then, being aware of concepts and tools that help in dealing with these considerations is not just important – but allows us to make a significant difference to the success of the projects we work on.

One emerging tool is Akka and it’s clustering facilities. Shortly I’ll show a few concepts to get your mind thinking about where you could apply tools like Akka, but I’ll also show a few code samples to emphasise that these benefits are very accessible.

Code sample for this post is on github.

Why Should I Care About Akka?

Let’s start with a problem… We’re building a holidays aggregration and disitribution platform. What our system does is fetch data from 200 different package providers, and distribute it to over 50 clients via ftp. This is a continuous process.

Competition in this market is fierce and clients want holidays and upto date availability in their systems as fast as possible – there’s a lot of money to be made on last-minute deals, and a lot of money to be lost in selling holiday’s that have already been sold elsewhere.

One key feature then is that the system needs to always be running – it needs high availability. Another important feature is performance – if this is to be maintained as the system grows with new providers and clients it needs to be scalable.

Just think to yourself now, how would you achieve this with the technologies you currently work with? I can’t think of too many things in the .NET world that would guide me towards highly-available, scalable applications, out of the box. There would be a lot of home-rolled infrastructure, and careful designing for scalability I suspect.

Akka Wants to Help You Solve These Problems ‘Easily’

Using Akka you don’t call methods – you send messages. This is because the programming model makes the assumption that you are building distributed, asynchronous applications. It’s just a bonus if a message gets sent and handled on the same machine.

This arises from the fact that the framework is engineered, fundamentally, to guide you into creating highly-available, scalable, fault-tolerant distributed applications…. There is no home-rolled infrastructure (you can add small bits and pieces if you need to).

Instead, with Akka you mostly focus on business logic as message flows. Check out the docs or pick up a book if you want to learn about the enabling concepts like supervision.

If you are contemplating a distributed topic map application, Akka should be of interest.

Work flow could result in different locations reflecting different topic map content.

January 19, 2013

Where Akka Came From

Filed under: Actor-Based,Akka — Patrick Durusau @ 7:09 pm

Where Akka Came From

From the post:

Sparked by the recent work on an Akka article on wikipedia, Jonas, Viktor and Yours Truly sat down to think back to the early days and how it all came about (I was merely an intrigued listener for the most part). While work on the article is ongoing, we thought it would be instructive to share a list of references to papers, talks and concepts which influenced the design—made Akka what it is today and what still is to come.

As you already know, I have a real weakness for source documentation, both ancient as well as more recent.

Enjoy!

January 7, 2013

Akka Documentation Release 2.1.0

Filed under: Actor-Based,Akka,Programming — Patrick Durusau @ 10:05 am

Akka Documentation Release 2.1.0 from Typesafe Inc. (PDF file)

The documentation answers the question, “What is Akka?” as follows:

Scalable real-time transaction processing

We believe that writing correct concurrent, fault-tolerant and scalable applications is too hard. Most of the time it’s because we are using the wrong tools and the wrong level of abstraction. Akka is here to change that. Using the Actor Model we raise the abstraction level and provide a better platform to build correct, concurrent, and scalable applications. For fault-tolerance we adopt the “Let it crash” model which the telecom industry has used with great success to build applications that self-heal and systems that never stop. Actors also provide the abstraction for transparent distribution and the basis for truly scalable and fault-tolerant applications.

Chris Cundill says “it’s virtually a book!,” which at 424 pages I think is a fair statement. 😉

Just skimming this looks quite readable!

I first saw this at This week in #Scala (04/01/2013) by Chris Cundill.

Akka 2.1.0 Released

Filed under: Actor-Based,Akka — Patrick Durusau @ 9:40 am

Akka 2.1.0 Released

From the post:

We—the Akka committers—are pleased to be able to announce the availability of Akka 2.1.0 ‘Mingus’. We are proud to include the work of 17 external committers, plus the work done by our great community in reporting and helping to diagnose bugs along the way.

This release refines and builds upon version 2.0, which was published a bit over nine months ago. The most prominent new features are

  • cluster support (experimental, including cluster membership logic & death watch and cluster-aware routers, see more below)
  • integration with Scala standard library (SIP-14 Futures, dataflow as add-on module, akka-actor.jar will be part of the Scala distribution)
  • Akka Camel support (Raymond Roestenburg & Piotr Gabryanczyk)
  • Encrypted Akka Remoting using SSL/TLS (Peter Badenhorst)
  • OSGi meta-information for most bundles (excluding samples and tests, Gert Vanthienen)
  • an ActorDSL for more concise actor declarations, e.g. in the REPL
  • a module for multi-node testing (to support you in developing clustered applications, experimental in the same sense as cluster support)
  • a Java API for the TestKit

In addition there have been a great number of small fixes and improvements, documentation updates (including a whole new section on message delivery guarantees), an area for contributions—akka-contrib—where community developments can mature and prove themselves and many more. A series of blog posts high-lighting the new features has been published over the past weeks on this blog, see this tag.

Looking forward to exploring the new features in this release!

Akka website

Akka downloads

I first saw this at This week in #Scala (04/01/2013) by Chris Cundill.

September 2, 2012

Discovering message flows in actor systems with the Spider Pattern

Filed under: Actor-Based,Akka,Messaging — Patrick Durusau @ 7:29 pm

Discovering message flows in actor systems with the Spider Pattern by Raymond Rostenberg.

From the post:

In this post I’m going to show a pattern that can be used to discover facts about an actor system while it is running. It can be used to understand how messages flow through the actors in the system. The main reason why I built this pattern is to understand what is going on in a running actor system that is distributed across many machines. If I can’t picture it, I can’t understand it (and I’m in good company with that quote 🙂

Building actor systems is fun but debugging them can be difficult, you mostly end up browsing through many log files on several machines to find out what’s going on. I’m sure you have browsed through logs and thought, “Hey, where did that message go?”, “Why did this message cause that effect” or “Why did this actor never get a message?”

This is where the Spider pattern comes in.

I would think the better quote would be: “If I can’t see it, I can’t understand it.” But each to their own.

Message passing systems remind me of Newcomb’s requirement for having audit trails for merging behavior.

Not necessary for every use case but when it is necessary, it is nice to know robust auditing is possible.

Or perhaps the better way to put it is that auditing is adjustable.

We can go from tracking every operation at one extreme to a middle ground of some tracking but protecting political appointees or career servants or even to a wide open system (sort of like Twitter or Facebook).

August 31, 2012

Distributed (in-memory) graph processing with Akka

Filed under: Akka,Graphs — Patrick Durusau @ 2:21 pm

Distributed (in-memory) graph processing with Akka by Adelbert Chang.

From the post:

Graphs have always been an interesting structure to study in both mathematics and computer science (among other fields), and have become even more interesting in the context of online social networks such as Facebook and Twitter, whose underlying network structures are nicely represented by graphs.

These graphs are typically “big”, even when sub-graphed by things such as location or school. With “big” graphs comes the desire to extract meaningful information from these graphs. In the age of multi-core CPU’s and distributed computing, concurrent processing of graphs proves to be an important topic.

Luckily, many graph analysis algorithms are trivially parallelizable. One example that comes to mind is all-pairs shortest path. In the case of an undirected, unweighted graph, we can consider each vertex individually, and do a full BFS from each vertex.

In this post I detail a general framework for distributed graph processing. I do not use any particular graph library, so my graph class will simply be called Graph. Popular graph libraries for Scala can be found in Twitter’s Cassovary project or the Graph for Scala project.

I will also make use of Derek Wyatt’s submission to the Akka Summer of Blog—”Balancing Workloads Across Nodes with Akka 2“—which provides a nice and simple implementation of a BalancingDispatcher in the context of distributed processing.

If you like Akka, graphs, or both, you will enjoy this post.

May 4, 2012

Dempsy – a New Real-time Framework for Processing BigData

Filed under: Akka,Apache S4,Dempsy,Erlang,Esper,HStreaming,Storm,Streambase — Patrick Durusau @ 3:43 pm

Dempsy – a New Real-time Framework for Processing BigData by Boris Lublinsky.

From the post:

Real time processing of BigData seems to be one of the hottest topics today. Nokia has just released a new open-source project – Dempsy. Dempsy is comparable to Storm, Esper, Streambase, HStreaming and Apache S4. The code is released under the Apache 2 license

Dempsy is meant to solve the problem of processing large amounts of "near real time" stream data with the lowest lag possible; problems where latency is more important that "guaranteed delivery." This class of problems includes use cases such as:

  • Real time monitoring of large distributed systems
  • Processing complete rich streams of social networking data
  • Real time analytics on log information generated from widely distributed systems
  • Statistical analytics on real-time vehicle traffic information on a global basis

The important properties of Dempsy are:

  • It is Distributed. That is to say a Dempsy application can run on multiple JVMs on multiple physical machines.
  • It is Elastic. That is, it is relatively simple to scale an application to more (or fewer) nodes. This does not require code or configuration changes but done by dynamic insertion or removal of processing nodes.
  • It implements Message Processing. Dempsy is based on message passing. It moves messages between Message processors, which act on the messages to perform simple atomic operations such as enrichment, transformation, etc. In general, an application is intended to be broken down into more smaller simpler processors rather than fewer large complex processors.
  • It is a Framework. It is not an application container like a J2EE container, nor a simple library. Instead, like the Spring Framework, it is a collection of patterns, the libraries to enable those patterns, and the interfaces one must implement to use those libraries to implement the patterns.

Dempsy’ programming model is based on message processors communicating via messages and resembles a distributed actor framework . While not strictly speaking an actor framework in the sense of Erlang or Akka actors, where actors explicitely direct messages to other actors, Dempsy’s Message Processors are "actor like POJOs" similar to Processor Elements in S4 and to some extent Bolts in Storm. Message processors are similar to actors in that they operate on a single message at a time, and need not deal with concurrency directly. Unlike actors, Message Processors also are relieved of the the need to know the destination(s) for their output messages, as this is handled inside by Dempsy based on the message properties.

In short Dempsy is a framework to enable the decomposing of a large class of message processing problems into flows of messages between relatively simple processing units implemented as POJOs. 

The Dempsy Tutorial contains more information.

See the post for an interview with Dempsy’s creator, NAVTEQ Fellow Jim Carroll.

Will the “age of data” mean that applications and their code will also be viewed and processed as data? The capabilities you have are those you request for a particular data set? Would like to see topic maps on the leading (and not dragging) edge of that change.

April 8, 2012

50 million messages per second – on a single machine

Filed under: Akka,Marketing — Patrick Durusau @ 4:19 pm

50 million messages per second – on a single machine

From the post:

50 million messages per second on a single machine is mind blowing!

We have measured this for a micro benchmark of Akka 2.0.

As promised in Scalability of Fork Join Pool I will here describe one of the tuning settings that can be used to achieve even higher throughput than the amazing numbers presented previously. Using the same benchmark as in Scalability of Fork Join Pool and only changing the configuration we go from 20 to 50 million messages per second.

The micro benchmark use pairs of actors sending messages to each other, classical ping-pong. All sharing the same fork join dispatcher.

Fairly sure the web scale folks will just sniff and move on. It’s not like every Facebook user sending individual messages to all of their friends and their friend’s friends, all at the same time.

On the other hand, 50 million messages per second per machine, on enough machines, and you are talking about a real pile of message. 😉

Are we approaching the point of data being responsible for processing itself and reporting the results? Or at least reporting itself to the nearest processor with the appropriate inputs? Perhaps by broadcasting a message itself?

Closer to home, could a topic map infrastructure be built using message passing that reports a TMDM based data model? For use by query or constraint languages? That is it presents a TMDM API as it were, although behind the scenes the reported API is the result of message passing and processing.

That would make the data model or API if you prefer, a matter of what message passing had been implemented.

More malleable and flexible than a relational database scheme or Cyc based ontology. An enlightened data structure, for a new age.

December 27, 2011

Typesafe Stack

Filed under: Akka,Scala — Patrick Durusau @ 7:14 pm

Typesafe Stack

From the website:

Scala. Akka. Simple.

A 100% open source, integrated distribution offering Scala, Akka, sbt, and the Scala IDE for Eclipse.

The Typesafe Stack makes it easy for developers to get started building scalable software systems with Scala and Akka. The Typesafe Stack is based on the most recent stable versions of Scala and Akka, and provides all of the major components needed to develop and deploy Scala and Akka applications.

Go ahead! You need something new to put on your new, shiny 5TB disk drive. 😉

December 25, 2011

Let It Crash

Filed under: Akka — Patrick Durusau @ 6:08 pm

Let It Crash

Who else but the Akka team would choose a blog title like: Let It Crash. 😉

An early post? Read on:

Location Transparency: Remoting in Akka 2.0

The remoting capabilities of Akka 2.0 are really powerful. Something that not has been as powerful is the documentation of the Akka remoting. We are constantly striving on improving it and this blog post will, hopefully, shed some light on the topic.

The remoting contains functionality not only to lookup a remote actor and send messages to it but also to deploy actors on remote nodes. These two types of interaction are referred to as:

  • Lookup
  • Creation

In the section below the two different approaches will be explained.
(It may be worth pointing out that a combination of the two ways is, of course, also feasible)

(see the post for the rest of it)

Encouraging because the team realizes that its documentation leaves something to be desired and just as importantly, it wants to do something about it.

Looking forward to more posts like this one.

Drop by and leave an encouraging word.

October 21, 2011

Scala Videos (and ebook)

Filed under: Akka,Scala — Patrick Durusau @ 7:27 pm

Scala Videos (and ebook)

While looking for something else (isn’t that always the case?) I ran across this collection of Scala videos and a free ebook, Scala for the Impatient, at Typesafe.

Something to enjoy over the weekend!

September 8, 2011

Akka

Filed under: Actor-Based,Akka — Patrick Durusau @ 5:53 pm

Akka

From the webpage:

Akka is the platform for the next generation event-driven, scalable and fault-tolerant architectures on the JVM

We believe that writing correct concurrent, fault-tolerant and scalable applications is too hard. Most of the time it’s because we are using the wrong tools and the wrong level of abstraction.

Akka is here to change that.

Using the Actor Model together with Software Transactional Memory we raise the abstraction level and provide a better platform to build correct concurrent and scalable applications.

For fault-tolerance we adopt the “Let it crash” / “Embrace failure” model which have been used with great success in the telecom industry to build applications that self-heal, systems that never stop.

Actors also provides the abstraction for transparent distribution and the basis for truly scalable and fault-tolerant applications.

Akka is Open Source and available under the Apache 2 License.

I am increasingly convinced that “we are using the wrong tools and the wrong level of abstraction.”

Everyone seems to agree on that part. Where they differ is on the right tool and right level of abstraction. 😉

I suspect the disagreement isn’t going away. But I mention Akka in case it seems like the right tool and right level of abstraction to you.

I would be mindful that the marketplace for non-concurrent, not so scalable semantic applications is quite large. Think of it this way, someone has the be the “Office” of semantic applications. May as well be you. Leave the high-end, difficult stuff to others.

Powered by WordPress