Archive for the ‘Erlang’ Category

Riak 1.4 – Install Notes on Ubuntu 12.04 (precise)

Thursday, July 11th, 2013

While installing Riak 1.4 I encountered some issues and thought writing down the answers might help someone else.

Following the instructions for Installing From Apt-Get, when I reached:

sudo apt-get install riak

I got this message:

Failed to fetch Size mismatch
E: Unable to fetch some archives, maybe run apt-get update or try with

Not a problem with the Riak 1.4 distribution but an error with Ubuntu.

Correct as follows:

sudo aptitude clean



sudo aptitude update


close, restart Linux

Cleans the apt cache and then the install was successful.

Post Installation Notes:

Basho suggests to start Riak with:

riak start

My results:

Unable to access /var/run/riak, permission denied, run script as root


sudo riak start

I then read:

sudo riak start
!!!! WARNING: ulimit -n is 1024; 4096 is the recommended minimum.

The ulimit warning is not unexpected and solutions are documented at: Open Files Limit.

As soon as I finish this session, I am going to create the file /etc/default/riak and its contents will be:

ulimit -n 65536

The file needs to be created as root.

May as well follow the instructions for “Enable PAM Based Limits for Debian & Ubuntu” in the Open Files document as well. Requires a reboot.

The rest of the tests of the node went well until I got to:

riak-admin diag

The documentation notes:

Make the recommended changes from the command output to ensure optimal node operation.

I was running in an Emacs shell so capturing the output was easy:

riak-admin diag
[critical] vm.swappiness is 60, should be no more than 0
[critical] net.core.wmem_default is 229376, should be at least 8388608
[critical] net.core.rmem_default is 229376, should be at least 8388608
[critical] net.core.wmem_max is 131071, should be at least 8388608
[critical] net.core.rmem_max is 131071, should be at least 8388608
[critical] net.core.netdev_max_backlog is 1000, should be at least 10000
[critical] net.core.somaxconn is 128, should be at least 4000
[critical] net.ipv4.tcp_max_syn_backlog is 2048, should be at least 40000
[critical] net.ipv4.tcp_fin_timeout is 60, should be no more than 15
[critical] net.ipv4.tcp_tw_reuse is 0, should be 1
[warning] The following preflists do not satisfy the n_val:
[approx. 376 lines omitted]
[notice] Data directory /var/lib/riak/bitcask is not mounted with ‘noatime’. Please remount its disk with the ‘noatime’ flag to improve performance.

The first block of messages:

[critical] vm.swappiness is 60, should be no more than 0
[critical] net.core.wmem_default is 229376, should be at least 8388608
[critical] net.core.rmem_default is 229376, should be at least 8388608
[critical] net.core.wmem_max is 131071, should be at least 8388608
[critical] net.core.rmem_max is 131071, should be at least 8388608
[critical] net.core.netdev_max_backlog is 1000, should be at least 10000
[critical] net.core.somaxconn is 128, should be at least 4000
[critical] net.ipv4.tcp_max_syn_backlog is 2048, should be at least 40000
[critical] net.ipv4.tcp_fin_timeout is 60, should be no more than 15
[critical] net.ipv4.tcp_tw_reuse is 0, should be 1

are network tuning issues.

Basho answers the “how to correct?” question at Linux Performance Tuning but there is no link from the Post Installation Notes.

The next block of messages:

[warning] The following preflists do not satisfy the n_val:
[approx. 376 lines omitted]

is a known issue: N Value – Preflist Message is Vague.

From the issue, the message means: “these preflists have more than one replica on the same node.”

Not surprising since I am running on one physical node and not in production.

The Riak Fast Track has you create four nodes on one physical node as a development environment. So I’m going to ignore the “prelists” warning in this context.

The last message:

[notice] Data directory /var/lib/riak/bitcask is not mounted with ‘noatime’. Please remount its disk with the ‘noatime’ flag to improve performance.

is resolved under “Mounts and Scheduler” in the Linux Performance Tuning document.

I am going to make all the system changes, reboot and start on the The Riak Fast Track tomorrow.

PS: In case you are wondering what this has to do with topic maps, ask yourself what characteristics you would want in a distributed topic map system?

Riak 1.4 Hits the Street!

Wednesday, July 10th, 2013

Well, they actually said: Basho Announces Availability of Riak 1.4.

From the post:

We are excited to announce the launch of Riak 1.4. With this release, we have added in more functionality and addressed some common requests that we hear from customers. In addition, there are a few features available in technical preview that you can begin testing and will be fully rolled out in the 2.0 launch later this year.

The new features and updates in Riak 1.4 include:

  • Secondary Indexing Improvements: Query results are now sorted and paginated, offering developers much richer semantics
  • Introducing Counters in Riak: Counters, Riak’s first distributed data type, provide automatic conflict resolution after a network partition
  • Simplified Cluster Management With Riak Control: New capabilities in Riak’s GUI-based administration tool improve the cluster management page for preparing and applying changes to the cluster
  • Reduced Object Storage Overhead: Values and associated metadata are stored and transmitted using a more compact format, reducing disk and network overhead
  • Hinted Handoff Progress Reporting: Makes operating the cluster, identifying and troubleshooting issues, and monitoring the cluster simpler
  • Improved Backpressure: Riak responds with an overload message if a vnode has too many messages in queue

Plus performance and management enhancements for the enterprise crowd.

Download Riak 1.4:

Code at:

Live webcast: What’s New in Riak 1.4” on July 12th.

That’s this coming Friday.

Improving your Erlang programming skills doing katas

Tuesday, June 11th, 2013

Improving your Erlang programming skills doing katas by Paolo D’incau.

From the post:

There is one sure thing about programming: you should try to improve your set of skills in a regular way. There are several different methods to achieve this kind of result: reading books and blogs, working on your own pet project and doing pair programming are all very good examples of this, but today I want to introduce you code kata. What is a kata? Well, since you ask, you won’t mind if I digress for a while first!

What is a kata?

In Japanese, the word kata is used to describe choreographed patterns of movements that are practised in solo or possibly with a partner. Kata are especially applied in martial arts because they do represent a way of teaching and practicing in a systematic approach rather than as individuals in a clumsy manner. If the concept of kata is still not clear (shame on me!) you just need to watch again the movie Karate Kid. For the whole movie Mr. Miyagi San teaches Daniel LaRusso the importance of kata and we know that Miyagi San is always right!

The basic concept behind kata is fairly simple: if we keep on practicing in a repetitive manner we can acquire the ability to execute movements without hesitation and to adapt them to a set of different situations without any fear. Pretty cool uh?

Coming back to the good old world of software developers (and especially Erlang ones) we may ask ourselves: “how can we apply the concept of kata to our daily routine?”. David Thomas (one of the authors of “The Pragmatic Programmer”) introduced the concept of Code Kata which is a programming exercise useful to improve our knowledge and skills through practice and repetition. The interesting point of code kata is that usually the exercises proposed are easy and can be implemented on a step-by-step fashion.

A chance to learn/improve your Erlang skills and to learn a good new habit! (Bad habits are easy to acquire.)


Erlang Camp 2013 is coming!

Monday, May 6th, 2013

Erlang Camp 2013 is coming! by Paolo D’Incau.

From the post:

Amsterdam: beautiful city of bicycles, canals and….. Erlang!

Nothing to do on Aug 30-31, 2013? What about travelling to the lovely city of Amsterdam and attend the Erlang Camp 2013?

If you have been following my blog for a while you should already know what Erlang Camp is: an intensive two day learning experience focused on getting you up to speed on creating large scale, fault tolerant distributed applications in Erlang.

In particular, during the Erlang Camp 2013 which is exceptionally sponsored by the amazing company SpilGames you will get in touch with several Erlang topics as:

  • Erlang basic stuff
  • Erlang OTP
  • How to ship your Erlang code using applications and releases
  • Erlang Distribution

More information on the Erlang Camp schedule may be found in this web page.

Erlang Camp is a pretty good way to learn Erlang language and to get in touch with some of the best Erlang teachers and developers outh there. Knowing that only 100 seats are available and that they will go quickly I suggest you to hurry and register for the event!

Speaking of summer camps! 😉

Not quite like Vacation Bible School but your tastes have changed since then.


Distributed resilience with functional programming

Saturday, February 9th, 2013

Distributed resilience with functional programming by Simon St. Laurent.

From the post:

Functional programming has a long and distinguished heritage of great work — that was only used by a small group of programmers. In a world dominated by individual computers running single processors, the extra cost of thinking functionally limited its appeal. Lately, as more projects require distributed systems that must always be available, functional programming approaches suddenly look a lot more appealing.

Steve Vinoski, an architect at Basho Technologies, has been working with distributed systems and complex projects for a long time, first as a tentative explorer and then leaping across to Erlang when it seemed right. Seventeen years as a columnist on C, C++, and functional languages have given him a unique viewpoint on how developers and companies are deciding whether and how to take the plunge.

Simon gives highlights from his interview of Steve Vinoski but I would start at the beginning, go to the end, then stop.

You do know that Simon has written an Erlang book? Introducing Erlang.

Haven’t seen it (yet) but knowing Simon you won’t be disappointed.

Learn You Some Erlang for Great Good!

Monday, December 17th, 2012

Learn You Some Erlang for Great Good! is now a real book! by Paolo D’Incau.

From the post:

In my humble opinion if you want to learn or improve your Erlang, writing a lot of code is a good idea but is really not enough: you have to learn from other people’s work, you have to read more from blogs and books.

That’s the reason why in one of my oldest posts I recommended you to take a look at 7 Erlang related websites among which you will find the good old I firmly believe that most of Erlangers out there learnt a lot from Fred Heber‘s work; the amount of information he provides is just impressive and his way to teach Erlang by small (well, not that small) examples is the best one I have seen so far online.

BTW, if you read Paolo’s post, you will find a 30% discount code for: Learn You Some Erlang for Great Good!.

Thanks to Paolo, I am now also waiting for my copy to arrive! (Misery loves company.)

BigData using Erlang, C and Lisp to Fight the Tsunami of Mobile Data

Monday, November 26th, 2012

BigData using Erlang, C and Lisp to Fight the Tsunami of Mobile Data by Jon Vlachogiannis.

From the post:

BugSense, is an error-reporting and quality metrics service that tracks thousand of apps every day. When mobile apps crash, BugSense helps developers pinpoint and fix the problem. The startup delivers first-class service to its customers, which include VMWare, Samsung, Skype and thousands of independent app developers. Tracking more than 200M devices requires fast, fault tolerant and cheap infrastructure.

The last six months, we’ve decided to use our BigData infrastructure, to provide the users with metrics about their apps performance and stability and let them know how the errors affect their user base and revenues.

We knew that our solution should be scalable from day one, because more than 4% of the smartphones out there, will start DDOSing us with data.

A number of lessons to consider if you want a system that scales.

RICON 2012 [videos, slides, resources]

Friday, November 2nd, 2012

RICON 2012 [videos, slides, resources]

From the webpage:

Basho Technologies, along with our sponsors, proudly presented RICON 2012, a two day conference dedicated to Riak, developers, and the future of distributed systems in production. This page is dedicated to post-conference consumption. Here you will find slidedecks, resources, and much more.

Videos for the weekend (for those of you without NetFlix accounts):

  • Joseph Blomstedt, Bringing Consistency to Riak
  • Sean Cribbs, Data Structures in Riak
  • Selena Deckelmann, Rapid Data Prototyping With Postgres
  • Dietrich Featherston, Modern Radiology for Distributed Systems
  • Gary Flake, Building a Social Application on Riak
  • Theo Schlossnagle, Next Generation Monitoring of Large Scale Riak Applications
  • Ines Sombra and Michael Brodhead, Riak in the Cloud
  • Andrew Thompson, Cloning the Cloud – Riak and Multi Data Center Replication

It is hard to decide what to watch first.

What do you think?

DRAKON-Erlang: Visual Functional Programming

Saturday, October 20th, 2012

DRAKON-Erlang: Visual Functional Programming

DRAKON is a visual programming language developed for the Buran Space Project.

I won’t repeat the surplus of adjectives used to describe DRAKON. Its long term use in the Russian space program is enough to recommend review of its visual techniques.

The DRAKO-Erlang project is an effort to combine DRAKON as a flow language/representation with Erlang.

A graphical notation for topic maps never caught on and with the rise of big data, visual representation of merging algorithms could be quite useful.

I am not suggesting DRAKON-Erlang as a solution to those issues but as a data point to take into account.


Count unique items in a text file using Erlang

Wednesday, October 17th, 2012

Count unique items in a text file using Erlang by Paolo D’Incau.

From the post:

Many times during our programming daily routine, we have to deal with log files. Most of the log files I have seen so far are just text files where the useful information are stored line by line.

Let’s say you are implementing a super cool game backend in Erlang, probably you would end up with a bunch of servers implementing several actions (e.g. authentication, chat, store character progress etc etc); well I am pretty sure you would not store the characters info in a text file, but maybe (and I said maybe) you could find useful to store in a text file some of the information that comes from the authentication server.

Unique in the sense you are thinking.

But that happens, even in topic maps.

Disco [Erlang/Python – MapReduce]

Monday, October 1st, 2012


From the webpage:

Disco is a distributed computing framework based on the MapReduce paradigm. Disco is open-source; developed by Nokia Research Center to solve real problems in handling massive amounts of data.

Disco is powerful and easy to use, thanks to Python. Disco distributes and replicates your data, and schedules your jobs efficiently. Disco even includes the tools you need to index billions of data points and query them in real-time.

Install Disco on your laptop, cluster or cloud of choice and become a part of the Disco community!

I rather like the MapReduce graphic you will see at About.

I first saw this in Guido Kollerie’s post on the recent Python users meeting in the Netherlands. Guido details his 5 minute presentation on Disco.

Process group in erlang: some thoughts about the pg module

Wednesday, September 19th, 2012

Process group in erlang: some thoughts about the pg module by Paolo D’Incau.

From the post:

One of the most common ways to achieve fault tolerance in distributed systems, consists in organizing several identical processes into a group, that can be accessed by a common name. The key concept here is that whenever a message is sent to the group, all members of the group receive it. This is a really nice feature, since if one process in the group fails, some other process can take over for it and handle the message, doing all the operations required.

Process groups allow also abstraction: when we send a message to a group, we don’t need to know who are the members and where they are. In fact process groups are all but static. Any process can join an existing group or leave one at runtime, moreover a process can be part of more groups at the same time.

Fault tolerance is going to be an issue if you are using topic maps and/or social media in an operational context.

Having really “cool” semantic capabilities isn’t worth much if the system fails at a critical point.

Elli (Erlang Web Server) [Lessons in Semantic Interoperability – Part 1]

Saturday, September 1st, 2012


From the post:

My name is Knut, and I want to show you something really cool that I built to solve some problems we are facing here at Wooga.

Having several very successful social games means we have a large number of users. In a single game, they can generate around ten thousand HTTP requests per second to our backend systems. Building and operating the software required to service these games is a big challenge that sometimes requires creative solutions.

As developers at Wooga, we are responsible for the user experience. We want to make our games not only fun and enjoyable but accessible at all times. To do this we need to understand and control the software and hardware we rely on. When we see an area where we can improve the user experience, we go for it. Sometimes this means taking on ambitious projects. An example of this is Elli, a webserver which has become one of the key building blocks of our successful backends.

Having used many of the big Erlang webservers in production with great success, we still found ourselves thinking of how we could improve. We want a simple and robust core with no errors or edge cases causing problems. We need to measure the performance to help us optimize our network and user code. Most importantly, we need high performance and low CPU usage so our servers can spend their resources running our games.

I started this post about Elli to point out the advantages of having a custom web server application. If your needs aren’t meet by one of the standard ones.

Something clicked and I realized that web servers, robust and fast as well as lame and slow, churn out semantically interoperable content every day.

For hundreds of millions of users.

Rather than starting from the perspective of the “semantic interoperability” we want, why not examine the “semantic interoperability” we have already, for clues on what may or may not work to increase it?

When I say “semantic interoperability” on the web, I am speaking of the interpretation of HTML markup, the <a>, <p>, <ol>, <ul>, <div>, <h1-6>, elements that make up most pages.

What characteristics do those markup elements share that might be useful in creating more semantic interoperability?

The first characteristic is simplicity.

You don’t need a lot of semantic overhead machinery or understanding to use any of them.

A plain text editor and knowledge that some text has a general presentation is enough.

Takes a few minutes for a user to learn enough HTML to produce meaningful (to them and others) results.

At least in the case of HTML, that simplicity has lead to a form of semantic interoperability.

HTML was defined with interoperable semantics but unadopted interoperable semantics are like no interoperable semantics at all.

If HTML has simplicity of semantics, what else does it have that lead to widespread adoption?

Erlang Cheat Sheet [And Cheat Sheets in General]

Monday, August 20th, 2012

Erlang Cheat Sheet

Fairly short (read limited) cheat sheet on Erlang. Found at:

Has a number of cheat sheets and is in the process of creating a cheat sheet template.

Questions that come to mind:

  • Using a topic map to support a cheat sheet, what more would you expect to see? Links to fuller examples? Links to manuals? Links to sub-cheat sheets?
  • Have you seen any ontology cheat sheets? For coding consistency, that sounds like something that could be quite handy.
  • For existing ontologies, any research on frequency of use to support the creation of cheat sheets? (Would not waste space on “thing” for example. Too unlikely to bear mentioning.)

Riak 1.2 Webinar – 21st August 2012

Wednesday, August 8th, 2012

Riak 1.2 Webinar – 21st August 2012

  • 11:00 Pacific Daylight Time (San Francisco, GMT-07:00)
  • 14:00 Eastern Daylight Time (New York, GMT-04:00)
  • 20:00 Europe Summer Time (Berlin, GMT+02:00)

From the registration page:

Join Basho Technologies’ Engineer, Joseph Blomstedt, for an in-depth overview of Riak 1.2, the latest version of Basho’s flagship open source database. In this live webinar, you will see changes in Riak 1.2 open source and Enterprise versions, including:

  • New approach to cluster administration
  • Built-in capability negotiation
  • Repair Search or KV Partitions thru Riak Console
  • Enhanced Handoff Reporting
  • Protobuf API Support for 2i and Search indexes
  • New Packaging for FreeBSD, SmartOS, and Ubuntu
  • Stats Improvements
  • LevelDB Improvements

I would have included this with the Riak 1.2 release post but was afraid you would not get past the download link and not see the webinar.

It’s on my calendar. How about yours?

Riak 1.2 Is Official!

Wednesday, August 8th, 2012

Riak 1.2 Is Official!

From the post:

Nearly three years ago to the day, from a set of green, worn couches in a modest office Cambridge, Massachusetts, the Basho team announced Riak to the world. To say we’ve come a long way from that first release would be an understatement, and today we’re pleased to announce the release and general availability of Riak 1.2.

Here’s the tl;dr on what’s new and improved since the Riak 1.1 release:

  • More efficiently add multiple Riak nodes to your cluster
  • Stage and review, then commit or abort cluster changes for easier operations; plus smoother handling of rolling upgrades
  • Better visibility into active handoffs
  • Repair Riak KV and Search partitions by attaching to the Riak Console and using a one-line command to recover from data corruption/loss
  • More performant stats for Riak; the addition of stats to Riak Search
  • 2i and Search usage thru the Protocol Buffers API
  • Official Support for Riak on FreeBSD
  • In Riak Enterprise: SSL encryption, better balancing and more granular control of replication across multiple data centers, NAT support

If that’s all you need to know, download the new release or read the official release notes. Also, go register for RICON.

OK, but I have a question: What happened to the lucky “…green, worn couches…”? 😉

Crash Course in Erlang

Sunday, May 20th, 2012

Crash Course in Erlang by Knut Hellan.

Knut writes:

This is a summary of a talk I held Monday May 14 2012 at an XP Meetup in Trondheim. It is meant as a teaser for listeners to play with Erlang themselves.

First, some basic concepts. Erlang has a form of constant called atom that is defined on first use. They are typically used as enums or symbols in other languages. Variables in Erlang are [im]mutable so assigning a new value to an existing variable is not allowed. (emphasis added)

Not so much an introduction as a tease to get you to learn more Erlang.

Some typos but look upon those as a challenge to verify what you are reading.

I may copy this post “as is” and use it as a “critical reading/research” assignment for my class.

Then have the students debate their corrections.

That could be a very interesting exercise on not taking everything you read on blind faith, how do you verify what you have read and in the process, evaluate that material as well.

Do you develop a sense of trust for some sources as being “better” than others? Are there ones you turn to by default?

Dempsy – a New Real-time Framework for Processing BigData

Friday, May 4th, 2012

Dempsy – a New Real-time Framework for Processing BigData by Boris Lublinsky.

From the post:

Real time processing of BigData seems to be one of the hottest topics today. Nokia has just released a new open-source project – Dempsy. Dempsy is comparable to Storm, Esper, Streambase, HStreaming and Apache S4. The code is released under the Apache 2 license

Dempsy is meant to solve the problem of processing large amounts of "near real time" stream data with the lowest lag possible; problems where latency is more important that "guaranteed delivery." This class of problems includes use cases such as:

  • Real time monitoring of large distributed systems
  • Processing complete rich streams of social networking data
  • Real time analytics on log information generated from widely distributed systems
  • Statistical analytics on real-time vehicle traffic information on a global basis

The important properties of Dempsy are:

  • It is Distributed. That is to say a Dempsy application can run on multiple JVMs on multiple physical machines.
  • It is Elastic. That is, it is relatively simple to scale an application to more (or fewer) nodes. This does not require code or configuration changes but done by dynamic insertion or removal of processing nodes.
  • It implements Message Processing. Dempsy is based on message passing. It moves messages between Message processors, which act on the messages to perform simple atomic operations such as enrichment, transformation, etc. In general, an application is intended to be broken down into more smaller simpler processors rather than fewer large complex processors.
  • It is a Framework. It is not an application container like a J2EE container, nor a simple library. Instead, like the Spring Framework, it is a collection of patterns, the libraries to enable those patterns, and the interfaces one must implement to use those libraries to implement the patterns.

Dempsy’ programming model is based on message processors communicating via messages and resembles a distributed actor framework . While not strictly speaking an actor framework in the sense of Erlang or Akka actors, where actors explicitely direct messages to other actors, Dempsy’s Message Processors are "actor like POJOs" similar to Processor Elements in S4 and to some extent Bolts in Storm. Message processors are similar to actors in that they operate on a single message at a time, and need not deal with concurrency directly. Unlike actors, Message Processors also are relieved of the the need to know the destination(s) for their output messages, as this is handled inside by Dempsy based on the message properties.

In short Dempsy is a framework to enable the decomposing of a large class of message processing problems into flows of messages between relatively simple processing units implemented as POJOs. 

The Dempsy Tutorial contains more information.

See the post for an interview with Dempsy’s creator, NAVTEQ Fellow Jim Carroll.

Will the “age of data” mean that applications and their code will also be viewed and processed as data? The capabilities you have are those you request for a particular data set? Would like to see topic maps on the leading (and not dragging) edge of that change.

Building Highly Available Systems in Erlang

Saturday, April 21st, 2012

Building Highly Available Systems in Erlang

From the description:


Joe Armstrong discusses highly available (HA) systems, introducing different types of HA systems and data, HA architecture and algorithms, 6 rules of HA, and how HA is done with Erlang.


Joe Armstrong is the principal inventor of Erlang and coined the term “Concurrency Oriented Programming”. At Ericsson he developed Erlang and was chief architect of the Erlang/OTP system. In 1998 he formed Bluetail, which developed all its products in Erlang. In 2003 he obtain his PhD from the Royal Institute of Technology, Stockholm. He is author of the book “Software for a concurrent world”.

Gives the six (6) rules for highly available systems and how Erlang meets those six (6) rules.

  • Isolation rule: Operations must be isolated
  • Concurrency: The world is concurrent
  • Must detect failures: If can’t detect, can’t fix
  • Fault Identification: Enough detail to do something.
  • Live Code Upgrade: Upgrade software while running.
  • Stable Storage: Must survive universal power failure.

Quotes: Why Computers Stop and What Can Be Done About It, Jim Gray, Technical Report 85.7, Tandem Computers 1985, for example.

Highly entertaining and informative.

What do you think of the notion of an evolving software system?

How would you apply that to a topic map system?

Modelling graphs with processes in Erlang

Wednesday, April 4th, 2012

Modelling graphs with processes in Erlang by Nick Gibson.

From the post:

One of the advantages of Erlang’s concurrency model is that creating and running new processes is much cheaper. This opens up opportunities to write algorithms in new ways. In this article, I’ll show you how you can implement a graph searching algorithm by modeling the domain using process interaction.

I’ll assume you’re more or less comfortable with Erlang, if you’re not you might want to go back and read through Builder AU’s previous guides on the subject.

First we need to write a function for the nodes in the graph. When we spawn a process for each node it will need to run a function that sends and receives messages. Each node needs two things, its own name, and the links it has to other nodes. To store the links, we’ll use a dictionary which maps name to the node’s Pid. [I checked the links and they still work. Amazing for a five year old post.]

In the graph reading club discussion today, it was suggested that we need to look at data structures more closely. There are a number of typical and not so typical data structures for graphs and/or graph databases.

I am curious if it would be better to develop the requirements for data structures, separate and apart from thinking of them as graph or graph database storage?

For example, we don’t want information about “edges,” but rather data items composed of two (or more) addresses (of other data items) per data item. Or an ordered list of such data items. And the addresses of the data items in question have specific characteristics.

Trying to avoid being influenced by the implied necessities of “edges,” at least until they are formally specified. At that point, we can evaluate data structures that meet all the previous requirements, plus any new ones.

Elixir – A modern approach to programming for the Erlang VM

Monday, April 2nd, 2012


From the homepage:

Elixir is a programming language built on top of the Erlang VM. As Erlang, it is a functional language built to support distributed, fault-tolerant, non-stop applications with hot code swapping.

Elixir is also dynamic typed but, differently from Erlang, it is also homoiconic, allowing meta-programming via macros. Elixir also supports polymorphism via protocols (similar to Clojure’s), dynamic records and provides a reference mechanism.

Finally, Elixir and Erlang share the same bytecode and data types. This means you can invoke Erlang code from Elixir (and vice-versa) without any conversion or performance hit. This allows a developer to mix the expressiveness of Elixir with the robustness and performance of Erlang.

If you want to install Elixir or learn more about it, check our getting started guide. [Former link, updated to:]

Quite possibly of interest to Erlang programmers.

Take a close look at the languages mentioned in the Wikipedia article on homoiconicity as other examples of homoiconic languages.

Question: The list contains “successful” and “unsuccessful” languages. Care to comment on possible differences that account for the outcomes?

Thinking a “successful” semantic mapping language will need to have certain characteristics. The question is, of course, which ones?

Intro to Distributed Erlang (screencast)

Sunday, April 1st, 2012

Intro to Distributed Erlang (screencast) by Bryan Hunter.

From the description:

Here’s an introduction to distribution in Erlang. This screencast demonstrates creating three Erlang nodes on a Windows box and one on a Linux box and then connecting them using the one-liner “net_adm:ping” to form a mighty compute cluster.

Topics covered:

  • Using erl to start an Erlang node (an instance of the Erlang runtime system).
  • How to use net_adm:ping to connect four Erlang nodes (three on Windows, one on Linux).
  • Using rpc:call to RickRoll a Linux box from an Erlang node running on a Windows box.
  • Using nl to load (deploy) a module from one node to all connected nodes.

Not the most powerful cluster but a good way to learn distributed Erlang.

Erlang as a Cloud Citizen

Saturday, March 31st, 2012

Erlang as a Cloud Citizen by Paolo Negri. (Erlang Factory San Francisco 2012)

From the description:

This talk wants to sum up the experience of designing, deploying and maintaining an Erlang application targeting the cloud and precisely AWS as hosting infrastructure.

As the application now serves a significantly large user base with a sustained throughput of thousands of games actions per second we’re able to analyse retrospectively our engineering and architectural choices and see how Erlang fits in the cloud environment also comparing it to previous experiences of clouds deployments of other platforms.

We’ll discuss properties of Erlang as a language and OTP as a framework and how we used them to design a system that is a good cloud citizen. We’ll also discuss topics that are still open for a solution.

Interesting but you probably want to wait for the video. The slides are interesting, considering the argument for fractal-like engineering for scale, but not enough detail to be really useful.

Still, responding to 0.25 billion uncacheable reqs/day is a performance number you should not ignore. Depends on your use case.

Milking Performance from Riak Search

Thursday, March 22nd, 2012

Milking Performance from Riak Search by Gary William Flake.

From the post:

The primary backend store of Clipboard is built on top of Riak, one of the lesser known NoSQLs solutions. We love Riak and are really happy with our experiences with it — both in terms of development and operations — but to get to where we are, we had to use some tricks. In this post I want to share with you why we chose Riak and also arm you with some of the best tricks that we learned along the way. Individually, these tricks gave us better than a 100x performance boost, so they may make a big difference for you too.

If you don’t know what Clipboard is, you should try it out. We’re in private beta now, but here’s a backdoor that will bypass the invitation system: Register at Clipboard.

Good discussion of term-based partitioning and its disadvantages. (Term-based partitioning being native to Riak.) Solved in part by judging likely queries in advance and precomputing inner joins. Not a bad method, depending on your confidence in your guesses about likely queries.

You will also have to determine if sorting on a primary key meets your needs, for a 10X to 100X performance gain.

A Peek Inside the Erlang Compiler

Thursday, February 9th, 2012

A Peek Inside the Erlang Compiler

From the post:

Erlang is a complex system, and I can’t do its inner workings justice in a short article, but I wanted to give some insight into what goes on when a module is compiled and loaded. As with most compilers, the first step is to convert the textual source to an abstract syntax tree, but that’s unremarkable. What is interesting is that the code goes through three major representations, and you can look at each of them.

Covers the following transformations:

  • Syntax trees to Core Erlang
  • Core Erlang to code for the register-based BEAM virtual machine (final output of compiler)
  • BEAM bytecode into threaded code (loader output)

Just in case you wanted to know more about Erlang than you found in the crash course. 😉

A deeper understanding of any language is useful. Understanding “why” a construction works is the first step to writing a better one.

Crash Course in Erlang

Thursday, February 9th, 2012

Crash Course in Erlang (PDF file) by Roy Deal Simon.

“If your language is not functional, it’s dysfunctional baby.”

I suppose I look at Erlang (and other) intros just to see if the graphics/illustrations are different from other presentations. 😉 Not enough detail to really teach you much but sometimes the graphics are worth remembering.

Not any time soon but it would be interesting to review presentations for common illustrations. Perhaps even a way to find the ones that are the best to use with particular audiences. Something to think about.

Vector Clocks – Easy/Hard?

Friday, February 3rd, 2012

The Basho blog has a couple of very good posts on vector clocks:

Why Vector Clocks are Easy

Why Vector Clocks are Hard

The problem statement was as follows:

Alice, Ben, Cathy, and Dave are planning to meet next week for dinner. The planning starts with Alice suggesting they meet on Wednesday. Later, Dave discuss alternatives with Cathy, and they decide on Thursday instead. Dave also exchanges email with Ben, and they decide on Tuesday. When Alice pings everyone again to find out whether they still agree with her Wednesday suggestion, she gets mixed messages: Cathy claims to have settled on Thursday with Dave, and Ben claims to have settled on Tuesday with Dave. Dave can’t be reached, and so no one is able to determine the order in which these communications happened, and so none of Alice, Ben, and Cathy know whether Tuesday or Thursday is the correct choice.

Vector clocks are used to keep the order of communications clear. Something you will need in distributed systems, including those for topic maps.


Monday, January 23rd, 2012


From the webpage:

Scalaris is a scalable, transactional, distributed key-value store. It can be used for building scalable Web 2.0 services.

Scalaris uses a structured overlay with a non-blocking Paxos commit protocol for transaction processing with strong consistency over replicas. Scalaris is implemented in Erlang.

Following I found:

Our work is similar to Amazon’s SimpleDB, but additionally supports full ACID properties. Dynamo, in contrast, restricts itself to eventual consistency only. As a test case, we chose Wikipedia, the free encyclopedia, that anyone can edit. Our implementation serves approx. 2,500 transactions per second with just 16 CPUs, which is better than the public Wikipedia.

Be forewarned that the documentation is in Google Docs, which does not like Firefox on Ubuntu.

Sigh, back to browser wars, again? Says it will work with Google Chrome.

Flake: A Decentralized, K-Ordered Unique ID Generator in Erlang

Wednesday, January 18th, 2012

Flake: A Decentralized, K-Ordered Unique ID Generator in Erlang

From the post:

At Boundary we have developed a system for unique id generation. This started with two basic goals:

  • Id generation at a node should not require coordination with other nodes.
  • Ids should be roughly time-ordered when sorted lexicographically. In other words they should be k-ordered 1, 2.

All that is required to construct such an id is a monotonically increasing clock and a location 3. K-ordering dictates that the most-significant bits of the id be the timestamp. UUID-1 contains this information, but arranges the pieces in such a way that k-ordering is lost. Still other schemes offer k-ordering with either a questionable representation of ‘location’ or one that requires coordination among nodes.

Just in case you are looking for a decentralized source of K-ordered unique IDs. 😉

First seen at: myNoSQL as: Flake: A Decentralized, K-Ordered Unique ID Generator in Erlang.

A Basic Full Text Search Server in Erlang

Monday, October 10th, 2011

A Basic Full Text Search Server in Erlang

From the post:

This post explains how to build a basic full text search server in Erlang. The server has the following features:

  • indexing
  • stemming
  • ranking
  • faceting
  • asynchronous search results
  • web frontend using websockets

Familiarity with the OTP design principles is recommended.

Looks like a good way to become familiar with Erlang and text search issues.