Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 24, 2015

NkBASE distributed database (Erlang)

Filed under: Erlang,NkBASE,Riak — Patrick Durusau @ 2:05 pm

NkBASE distributed database (Erlang)

From the webpage:

NkBASE is a distributed, highly available key-value database designed to be integrated into Erlang applications based on riak_core. It is one of the core pieces of the upcoming Nekso’s Software Defined Data Center Platform, NetComposer.

NkBASE uses a no-master, share-nothing architecture, where no node has any special role. It is able to store multiple copies of each object to achive high availabity and to distribute the load evenly among the cluster. Nodes can be added and removed on the fly. It shows low latency, and it is very easy to use.

NkBASE has some special features, like been able to work simultaneously as a eventually consistent database using Dotted Version Vectors, a strong consistent database and a eventually consistent, self-convergent database using CRDTs called dmaps. It has also a flexible and easy to use query language that (under some circunstances) can be very efficient, and has powerful support for auto-expiration of objects.

The minimum recommended cluster size for NkBASE is three nodes, but it can work from a single node to hundreds of them. However, NkBASE is not designed for very high load or huge data (you really should use the excellent Riak and Riak Enterprise for that), but as an in-system, flexible and easy to use database, useful in multiple scenarios like configuration, sessions, cluster coordination, catalogue search, temporary data, cache, field completions, etc. In the future, NetComposer will be able to start and manage multiple kinds of services, including databases like a full-blown Riak.

NkBASE has a clean code base, and can be used as a starting point to learn how to build a distributed Erlang system on top of riak_core, and to test new backends or replication mechanisms. NkBASE would have been impossible without the incredible work from Basho, the makers of Riak: riak_core, riak_dt and riak_ensemble.

Several things caught my attention about NkBASE.

That it is written in Erlang was the first thing.

That is is based on riak_core was the second thing.

But the thing that sealed it appearance here was:

NkBASE is not designed for very high load or huge data (you really should use the excellent Riak and Riak Enterprise for that)

What?

A software description that doesn’t read like Topper in Dilbert?

Kudos!

See the GitHub page for all the details but this looks promising, for the right range of applications.

November 11, 2014

Riak 2.0

Filed under: Riak — Patrick Durusau @ 11:29 am

Discovering Riak 2.0 Webinar Series

From the webpage:

Webinar Registration

Join Basho Product experts and customers as we take a deep dive into the Riak 2.0 features and capabilities. A brief overview of Riak 2.0 is covered in a recent blog post here http://basho.com/distributed-data-types-riak-2-0

Each webinar will be held twice on the indicated days to accommodate different time zones. Please register for the Webinars that interest you by clicking on the links below.

Thurs. 11/13 – “Deep Dive on Riak 2.0 Data Types”

Thu, Nov 13, 2014 8:00 AM – 9:00 AM PST
Thu, Nov 13, 2014 12:00 PM – 1:00 PM PST

Riak is an eventually consistent system. When handling conflicts, due to concurrent writes, in a distributed database the client application must have a way to resolve conflicts.

Riak Data Types give the developer the power of application modeling, while relieving them of the burden of designing and testing merge functions.

In this webinar we will provide an overview of Riak Data Types, the approach to adding them to Riak, and their usage in a practical application.

Wed. 11/19 – “Using Solr to Find Your Keys”

Wed, Nov 19, 2014 8:00 AM – 9:00 AM PST
Wed, Nov 19, 2014 12:00 PM – 1:00 PM PST

Riak 2.0 contains the next iteration of Riak Search, it pairs the strength of Riak as a horizontally scalable, distributed database with the powerful full-text search functionality of Apache Solr.

Reading the blog post at: http://basho.com/distributed-data-types-riak-2-0 will be good preparation for the first seminar.

From the post:

CRDT stands for (variously) Conflict-free Replicated Data Type, Convergent Replicated Data Type, Commutative Replicated Data Type, and others. The key, repeated, phrase is “Replicated Data Types”.

One strategy for avoiding data conflicts is normalization as we know from the relational world. Where normalization results in only one copy of any data. But that presumes human curation of the data structure that eliminates duplication of data.

Normalization isn’t a concern for distributed systems, which by definition can have multiple copies of the same data. But what happens when inconsistent duplicated data is combined together is the issue addressed by CRDTs (whatever your expansion).

If you think about it, topics that represent the same subject may well hold “inconsistent” data about that subject. Data that is present on one topic and absent on the other. Or that is on both topics and is inconsistent. CRDTs offer a way to define automated handling of some forms of “inconsistency.”

Suggestion: Install a copy of Riack 2.0 before the webinar.

July 22, 2014

Readings in conflict-free replicated data types

Filed under: Consistency,CRDT,Merging,Riak — Patrick Durusau @ 6:55 pm

Readings in conflict-free replicated data types by Christopher Meiklejohn.

From the post:

This is a work in progress post outlining research topics related to conflict-free replicated data types, or CRDTs.

Yesterday, Basho announced the release of Riak 2.0.0 RC1, which contains a comprehensive set of “data types” that can be used for building more robust distributed applications. For an overview of how to use these data types in Riak to avoid custom, and error prone, merge functions, see the Basho documentation site.

You’re probably more familiar with another name for these data types: conflict-free replicated data types (CRDTs). Simply put, CRDTs are data structures which capture some aspect of causality, along with providing interfaces for safely operating over the value and correctly merging state with diverged and concurrently edited structures.

This provides a very useful property when combined with an eventual consistency, or AP-focused, data store: Strong Eventual Consistency (SEC). Strong Eventual Consistency is an even stronger convergence property than eventual consistency: given that all updates are delivered to all replicas, there is no need for conflict resolution, given the conflict-free merge properties of the data structure. Simply put, correct replicas which have received all updates have the same state.

Here’s a great overview by one of the inventors of CRDTs, Marc Shapiro, where he discusses conflict-free replicated data types and their relation to strong eventual consistency.

In this Hacker News thread, there was an interesting discussion about why one might want to implement these on the server, why implementing them is non-trivial, and what the most recent research related to them consists of.

This post serves as a reading guide on the the various areas of conflict-free replicated data types. Papers are broken down into various areas and sorted in reverse chronologically.

Relevant to me because the new change tracking in ODF is likely to be informed by CRDTs and because eventually consistent merging is important for distributed topic maps.

Confusion would be the result if the order of merging topics results in different topic maps.

CRDTs are an approach to avoid that unhappy outcome.

Enjoy!

PS: Remember to grab a copy of Riak 2.0.0 RC1.

January 17, 2014

RICON West 2013 Videos Posted!

Filed under: Erlang,Programming,Riak — Patrick Durusau @ 2:52 pm

RICON West 2013 Videos Posted!

Rather than streaming the entire two (2) days, you can now view individual videos from RICON West 2013!

By author:

By title:

  • Bad As I Wanna Be: Coordination and Consistency in Distributed Databases (Bailis) – RICON West 2013
  • Bringing Consistency to Riak (Part 2) (Joseph Blomstedt) – RICON West 2013
  • Building Next Generation Weather Data Distribution and On-demand Forecast Systems Using Riak (Raja Selvaraj)
  • Controlled Epidemics: Riak's New Gossip Protocol and Metadata Store (Jordan West) – RICON West 2013
  • CRDTs: An Update (or maybe just a PUT) (Sam Elliott) – RICON West 2013
  • CRDTs in Production (Jeremy Ong) – RICON West 2013
  • Denormalize This! Riak at State Farm (Richard Simon and Richard Berglund) – RICON West 2013
  • Distributed Systems Archeology (Michael Bernstein) – RICON West 2013
  • Distributing Work Across Clusters: Adventures With Riak Pipe (Susan Potter) – RICON West 2013
  • Dynamic Dynamos: Comparing Riak and Cassandra (Jason Brown) – RICON West 2013
  • LVars: lattice-based data structures for deterministic parallelism (Lindsey Kuper) – RICON West 2013
  • Maximum Viable Product (Justin Sheehy) – RICON West 2013
  • More Than Just Data: Using Riak Core to Manage Distributed Services (O'Connell) – RICON West 2013
  • Practicalities of Productionizing Distributed Systems (Jeff Hodges) – RICON West 2013
  • The Raft Consensus Algorithm (Diego Ongaro) – RICON West 2013
  • Riak Search 2.0 (Eric Redmond) – RICON West 2013
  • Riak Security; Locking the Distributed Chicken Coop (Andrew Thompson) – RICON West 2013
  • RICON West 2013 Lightning Talks
  • Seagate Kinetic Open Storage: Innovation to Enable Scale Out Storage (Hughes) – RICON West 2013
  • The Tail at Scale: Achieving Rapid Response Times in Large Online Services (Dean) – RICON West 2013
  • Timely Dataflow in Naiad (Derek Murray) – RICON West 2013
  • Troubleshooting a Distributed Database in Production (Shoffstall and Voiselle) – RICON West 2013
  • Yuki: Functional Data Structures for Riak (Ryland Degnan) – RICON West 2013
  • Enjoy!

    November 2, 2013

    RICON West 2013! (video streams)

    Filed under: Functional Programming,Riak — Patrick Durusau @ 5:59 pm

    RICON West 2013! (video streams)

    Presentation by presentation editing is underway but you can duplicate the conference experience at your own terminal!

    Day 1 and 2, both tracks are ready for your viewing!

    True, it’s not interactive but here you can pause the speaker while you take a call or answer email. 😉

    Enjoy!

    July 15, 2013

    RICON East 2013 [videos, slides, resources]

    Filed under: Erlang,Riak — Patrick Durusau @ 12:05 pm

    RICON East 2013 [videos, slides, resources]

    I have sorted (by author) and included the abstracts for the RICON East presentations. The RICON East webpage has links to blog entries about the conference.

    Enjoy!


    Brian Akins, Large Scale Data Service as a Service
    Slides | Video

    Turner Broadcasting hosts several large sites that need to serve “data” to millions of clients over HTTP. A couple of years ago, we started building a generic service to solve this and to retire several legacy systems. We will discuss the general architecture, the growing pains, and why we decided to use Riak. We will also share some implementation details and the use of the service for a few large internet events.


    Neil Conway, Bloom: Big Systems from Small Programs
    Slides | Video

    Distributed systems are ubiquitous, but distributed programs remain stubbornly hard to write. While many distributed algorithms can be concisely described, implementing them requires large amounts of code–often, the essence of the algorithm is obscured by low-level concerns like exception handling, task scheduling, and message serialization. This results in programs that are hard to write and even harder to maintain. Can we do better?

    Bloom is a new programming language we’ve developed at UC Berkeley that takes two important steps towards improving distributed systems development. First, Bloom programs are designed to be declarative and concise, aided by a new philosophy for reasoning about state and time. Second, Bloom can analyze distributed programs for their consistency requirements and either certify that eventual consistency is sufficient, or identify program locations where stronger consistency guarantees are needed. In this talk, I’ll introduce the language, and also suggest how lessons from Bloom can be adopted in other distributed programming stacks.


    Sean Cribbs, Just Open a Socket – Connecting Applications to Distributed Systems
    Slides | Video

    Client-server programming is a discipline as old as computer networks and well-known. Just connect socket to the server and send some bytes back and forth, right?

    Au contraire! Building reliable, robust client libraries and applications is actually quite difficult, and exposes a lot of classic distributed and concurrent programming problems. From understanding and manipulating the TCP/IP network stack, to multiplexing connections across worker threads, to handling partial failures, to juggling protocols and encodings, there are many different angles one must cover.

    In this talk, we’ll discuss how Basho has addressed these problems and others in our client libraries and server-side interfaces for Riak, and how being a good client means being a participant in the distributed system, rather than just a spectator.


    Reid Draper, Advancing Riak CS
    Slides | Video

    Riak CS has come a long way since it was first released in 2012, and then open sourced in March 2013. We’ll take a look at some of the features and improvements in the recently released Riak CS 1.3.0, and planned for the future, like better integration with CloudStack and OpenStack. Next, we’ll go over some of the Riak CS guts that deployers should understand in order to successfully deploy, monitor and scale Riak CS.


    Camille Fournier, ZooKeeper for the Skeptical Architect
    Slides | Video

    ZooKeeper is everywhere these days. It’s a core component of the Hadoop ecosystem. It provides the glue that enables high availability for systems like Redis and Solr. Your favorite startup probably uses it internally. But as every good skeptic knows, just because something is popular doesn’t mean you should use it. In this talk I will go over the core uses of ZooKeeper in the wild and why it is suited to these use cases. I will also talk about systems that don’t use ZooKeeper and why that can be the right decision. Finally I will discuss the common challenges of running ZooKeeper as a service and things to look out for when architecting a deployment.


    Sathish Gaddipati, Building a Weather Data Services Platform on Riak
    Slides | Video

    In this talk Sathish will discuss the size, complexity and use cases surrounding weather data services and analytics, which will entail an overview of the architecture of such systems and the role of Riak in these patterns.


    Sunny Gleason, Riak Techniques for Advanced Web & Mobile Application Development
    Slides | Video

    In recent years, there have been tremendous advances in high-performance, high-availability data storage for scalable web and mobile application development. Often times, these NoSQL solutions are portrayed as sacrificing the crispness and rapid application development features of relational database alternatives. In this presentation, we show the amazing things that are possible using a variety of techniques to apply Riak’s advanced features such as map-reduce, search, and secondary indexes. We review each feature in the context of a demanding real-world Ruby & Javascript “Pinterest clone” application with advanced features such as real-time updates via Websocket, comment feeds, content quarantining, permissions, search and social graph modeling. We pay specific attention to explaining the ‘why’ of these Riak techniques for high-performance, high availability applications, not just the ‘how’.


    Andy Gross, Lessons Learned and Questions Raised from Building Distributed Systems
    Slides | Video


    Shawn Gravelle and Sam Townsend, High Availability with Riak and PostgreSQL
    Slides | Videos

    This talk will cover work to build out an internal cloud offering using Riak and PostgreSQL as a data layer, architectural decisions made to achieve high availability, and lessons learned along the way.


    Rich Hickey, Using Datomic with Riak
    Video

    Rich Hickey, the author of Clojure and designer of Datomic, is a software developer with over 20 years of experience in various domains. Rich has worked on scheduling systems, broadcast automation, audio analysis and fingerprinting, database design, yield management, exit poll systems, and machine listening, in a variety of languages.


    James Hughes, Revolution in Storage
    Slides | Video

    The trends of technology are rocking the storage industry. Fundamental changes in basic technology, combined with massive scale, new paradigms, and fundamental economics leads to predictions of a new storage programming paradigm. The growth of low cost/GB disk is continuing with technologies such as Shingled Magnetic Recording. Flash and RAM are continuing to scale with roadmaps, some argue, down to atom scale. These technologies do not come without a cost. It is time to reevaluate the interface that we use to all kinds of storage, RAM, Flash and Disk. The discussion starts with the unique economics of storage (as compared to processing and networking), discusses technology changes, posits a set of open questions and ends with predictions of fundamental shifts across the entire storage hierarchy.


    Kyle Kingsbury, Call Me Maybe: Carly Rae Jepsen and the Perils of Network Partitions
    Slides | Code | Video

    Network partitions are real, but their practical consequences on complex applications are poorly understood. I want to talk about some of the neat ways I’ve found to lose important data, the challenge of building systems which are reliable under partitions, and what it means for you, an application developer.


    Hilary Mason, Realtime Systems for Social Data Analysis
    Slides | Video

    It’s one thing to have a lot of data, and another to make it useful. This talk explores the interplay between infrastructure, algorithms, and data necessary to design robust systems that produce useful and measurable insights for realtime data products. We’ll walk through several examples and discuss the design metaphors that bitly uses to rapidly develop these kinds of systems.


    Michajlo Matijkiw, Firefighting Riak at Scale
    Slides | Video

    Managing a business critical Riak instance in an enterprise environment takes careful planning, coordination, and the willingness to accept that no matter how much you plan, Murphy’s law will always win. At CIM we’ve been running Riak in production for nearly 3 years, and over those years we’ve seen our fair share of failures, both expected and unexpected. From disk melt downs to solar flares we’ve managed to recover and maintain 100% uptime with no customer impact. I’ll talk about some of these failures, how we dealt with them, and how we managed to keep our clients completely unaware.


    Neha Narula, Why Is My Cache So Dumb? Smarter Caching with Pequod
    Slides | Video

    Pequod is a key/value cache we’re developing at MIT and Harvard that automatically updates the cache to keep data fresh. Pequod exploits a common pattern in these computations: different kinds of cached data are often related to each other by transformations equivalent to simple joins, filters, and aggregations. Pequod allows applications to pre-declare these transformations with a new abstraction, the cache join. Pequod then automatically applies the transformations and tracks relationships to materialize data and keep the cache up to date, and in many cases improves performance by reducing client/cacheserver communication. Sound like a database? We use abstractions from databases like joins and materialized views, while still maintaining the performance of an in-memory key/value cache.

    In this talk, I’ll describe the challenges caching solves, the problems that still exist, and how tools like Pequod can make the space better.


    Alex Payne, Nobody ever got fired for picking Java: evaluating emerging programming languages for business-critical systems
    Slides | Video

    When setting out to build greenfield systems, engineers today have a broader choice of programming language than ever before. Over the past decade, language development has accelerated dramatically thanks to mature runtimes like the JVM and CLR, not to mention the prevalence of near-universal targets for cross-compilation like JavaScript. With strong technological foundations to build on and an active open source community, modern languages can evolve from rough hobbyist projects into capable tools in a stunningly short period of time. With so many strong contenders emerging every day, how do you decide what language to bet your business on? We’ll explore the landscape of new languages and provide a decision-making framework you can use to narrow down your choices.


    Theo Schlossnagle and Robert Treat, How Do You Eat An Elephant?
    Slides | Video

    When OmniTI first set out to build a next generation monitoring system, we turned to one of our most trusted tools for data management; Postgres. While this worked well for developing the initial Open Source application, as we continued to grow the Circonus public monitoring service, we eventually ran into scaling issues. This talk will cover some of the changes we made to make the original Postgres system work better, talk about some of the other systems we evaluated, and discuss the eventual solution to our problem; building our own time series database. Of course, that’s only half the story. We’ll also go into how we swapped out these backend data storage pieces in our production environment, all the while capturing and reporting on millions of metrics, without downtime or customer interruption.


    Dr. Margo Seltzer, Automatically Scalable Computation
    Slides | Video

    As our computational infrastructure races gracefully forward into increasingly parallel multi-core and blade-based systems, our ability to easily produce software that can successfully exploit such systems continues to stumble. For years, we’ve fantasized about the world in which we’d write simple, sequential programs, add magic sauce, and suddenly have scalable, parallel executions. We’re not there. We’re not even close. I’ll present trajectory-based execution, a radical, potentially crazy, approach for achieving automatic scalability. To date, we’ve achieved surprisingly good speedup in limited domains, but the potential is tantalizingly enormous.


    Chris Tilt, Riak Enterprise Revisited
    Slides | Video

    Riak Enterprise has undergone an overhaul since it’s 1.2 days, mostly around Mult-DataCenter replication. We’ll talk about the “Brave New World” of replication in depth, how it manages concurrent TCP/IP connections, Realtime Sync, and the technology preview of Active Anti-Entropy Fullsync. Finally, we’ll peek over the horizon at new features such as chaining of Realtime sync messages across multiple clusters.


    Sam Townsend, High Availability with Riak and PostgreSQL
    Slides | Video


    Mark Wunsch, Scaling Happiness Horizontally
    Slides | Video

    This talk will discuss how Gilt has grown its technology organization to optimize for engineer autonomy and happiness and how that optimization has affected its software. Conway’s Law states that an organization that designs systems will inevitably produce systems that are copies of the communication structures of the organization. This talk will work its way between both the (gnarly) technical details of Gilt’s application architecture (something we internally call “LOSA”) and the Gilt Tech organization structure. I’ll discuss the technical challenges we came up against, and how these often pointed out areas of contention in the organization. I’ll discuss quorums, failover, and latency in the context of building a distributed, decentralized, peer-to-peer technical organization.


    Matthew Von-Maszewski, Optimizing LevelDB for Performance and Scale
    Slides | Video

    LevelDB is a flexible key-value store written by Google and open sourced in August 2011. LevelDB provides an ordered mapping of binary keys to binary values. Various companies and individuals utilize LevelDB on cell phones and servers alike. The problem, however, is it does not run optimally on either as shipped.

    This presentation outlines the basic internal mechanisms of LevelDB and then proceeds to discuss the tuning opportunities in the source code for each mechanism. This talk will draw heavily from our experiences optimizing LevelDB for use in Riak, which is handy for running sufficiently large clusters.


    Ryan Zezeski, Yokozuna: Distributed Search You Don’t Think About
    Slides | Video

    Allowing users to run arbitrary and complex searches against your data is a feature required by most consumer facing applications. For example, the ability to get ranked results based on free text search and subsequently drill down on that data based on secondary attributes is at the heart of any good online retail shop. Not only must your application support complex queries such as “doggy treats in a 2 mile radius, broken down by popularity” but it must also return in hundreds of milliseconds or less to keep users happy. This is what systems like Solr are built for. But what happens when the index is too big to fit on a single node? What happens when replication is needed for availability? How do you give correct answers when the index is partitioned across several nodes? These are the problems of distributed search. These are some of the problems Yokozuna solves for you without making you think about it.

    In this talk Ryan will explain what search is, why it matters, what problems distributed search brings to the table, and how Yokozuna solves them. Yokozuna provides distributed and available search while appearing to be a single-node Solr instance. This is very powerful for developers and ops professionals.


    I first saw this in a tweet by Alex Popescu.

    PS: If more videos go up and I miss it, please ping me. Thanks!

    July 12, 2013

    Riak 1.4 – More Install Notes on Ubuntu 12.04 (precise)

    Filed under: Erlang,Riak — Patrick Durusau @ 1:37 pm

    Following up on yesterday’s post on installing Riak 1.4 with some minor nits.

    Open File Limits

    The Open Files Limit leaves the reader dangling with:

    However, what most needs to be changed is the per-user open files limit. This requires editing /etc/security/limits.conf, which you’ll need superuser access to change. If you installed Riak or Riak Search from a binary package, add lines for the riak user like so, substituting your desired hard and soft limits:

    (next paragraph)

    Suggest:

    riak soft nofile 65536
    riak hard nofile 65536

    Tab separated values in /etc/security/limits.conf.

    The same page also suggests an open file value of 50384 if you are starting Riak with init scripts. I don’t know the reason for the difference but 50384 occurs only once in Linux examples so while it may work, I am starting with the higher value.

    Performance Tuning

    I followed the directions at Linux Performance Tuning, but suggest you also add:

    # Added by
    # Network tuning parameters for Riak 1.4
    # As per: http://docs.basho.com/riak/1.3.1/cookbooks/Linux-Performance-Tuning/

    both here and for your changes to limits.conf.

    Puts others on notice of the reason for the settings and points to documentation.

    Enter the same type of note for your setting of the noatime flag in /etc/fstab (under Mounts and Scheduler in Linux Performance Tuning).

    On reboot, check your settings with:

    ulimit -a

    I was going to do the Riak Fast Track today but got distracted with configuration issues with Ruby, RVM, KDE and the viewer for Riak docs.

    Look for Fast Track notes over the weekend.

    July 11, 2013

    Riak 1.4 – Install Notes on Ubuntu 12.04 (precise)

    Filed under: Erlang,Riak — Patrick Durusau @ 12:57 pm

    While installing Riak 1.4 I encountered some issues and thought writing down the answers might help someone else.

    Following the instructions for Installing From Apt-Get, when I reached:

    sudo apt-get install riak

    I got this message:

    Failed to fetch
    http://apt.basho.com/pool/precise/main/riak_1.4.0-1_amd64.deb Size mismatch
    E: Unable to fetch some archives, maybe run apt-get update or try with
    –fix-missing?

    Not a problem with the Riak 1.4 distribution but an error with Ubuntu.

    Correct as follows:

    sudo aptitude clean

    (rtn)

    Then;

    sudo aptitude update

    (rtn)

    close, restart Linux

    Cleans the apt cache and then the install was successful.

    Post Installation Notes:

    Basho suggests to start Riak with:

    riak start

    My results:

    Unable to access /var/run/riak, permission denied, run script as root

    Use:

    sudo riak start

    I then read:

    sudo riak start
    !!!!
    !!!! WARNING: ulimit -n is 1024; 4096 is the recommended minimum.
    !!!!

    The ulimit warning is not unexpected and solutions are documented at: Open Files Limit.

    As soon as I finish this session, I am going to create the file /etc/default/riak and its contents will be:

    ulimit -n 65536

    The file needs to be created as root.

    May as well follow the instructions for “Enable PAM Based Limits for Debian & Ubuntu” in the Open Files document as well. Requires a reboot.

    The rest of the tests of the node went well until I got to:

    riak-admin diag

    The documentation notes:

    Make the recommended changes from the command output to ensure optimal node operation.

    I was running in an Emacs shell so capturing the output was easy:

    riak-admin diag
    [critical] vm.swappiness is 60, should be no more than 0
    [critical] net.core.wmem_default is 229376, should be at least 8388608
    [critical] net.core.rmem_default is 229376, should be at least 8388608
    [critical] net.core.wmem_max is 131071, should be at least 8388608
    [critical] net.core.rmem_max is 131071, should be at least 8388608
    [critical] net.core.netdev_max_backlog is 1000, should be at least 10000
    [critical] net.core.somaxconn is 128, should be at least 4000
    [critical] net.ipv4.tcp_max_syn_backlog is 2048, should be at least 40000
    [critical] net.ipv4.tcp_fin_timeout is 60, should be no more than 15
    [critical] net.ipv4.tcp_tw_reuse is 0, should be 1
    [warning] The following preflists do not satisfy the n_val:
    [[{0,
    ‘riak@127.0.0.1’},
    {22835963083295358096932575511191922182123945984,
    ‘riak@127.0.0.1’},
    {45671926166590716193865151022383844364247891968,
    ‘riak@127.0.0.1’}],
    [approx. 376 lines omitted]
    [{1438665674247607560106752257205091097473808596992,
    ‘riak@127.0.0.1’},
    {0,
    ‘riak@127.0.0.1’},
    {22835963083295358096932575511191922182123945984,
    ‘riak@127.0.0.1’}]]
    [notice] Data directory /var/lib/riak/bitcask is not mounted with ‘noatime’. Please remount its disk with the ‘noatime’ flag to improve performance.

    The first block of messages:

    [critical] vm.swappiness is 60, should be no more than 0
    [critical] net.core.wmem_default is 229376, should be at least 8388608
    [critical] net.core.rmem_default is 229376, should be at least 8388608
    [critical] net.core.wmem_max is 131071, should be at least 8388608
    [critical] net.core.rmem_max is 131071, should be at least 8388608
    [critical] net.core.netdev_max_backlog is 1000, should be at least 10000
    [critical] net.core.somaxconn is 128, should be at least 4000
    [critical] net.ipv4.tcp_max_syn_backlog is 2048, should be at least 40000
    [critical] net.ipv4.tcp_fin_timeout is 60, should be no more than 15
    [critical] net.ipv4.tcp_tw_reuse is 0, should be 1

    are network tuning issues.

    Basho answers the “how to correct?” question at Linux Performance Tuning but there is no link from the Post Installation Notes.

    The next block of messages:

    [warning] The following preflists do not satisfy the n_val:
    [[{0,
    ‘riak@127.0.0.1’},
    {22835963083295358096932575511191922182123945984,
    ‘riak@127.0.0.1’},
    {45671926166590716193865151022383844364247891968,
    ‘riak@127.0.0.1’}],
    [approx. 376 lines omitted]
    [{1438665674247607560106752257205091097473808596992,
    ‘riak@127.0.0.1’},
    {0,
    ‘riak@127.0.0.1’},
    {22835963083295358096932575511191922182123945984,
    ‘riak@127.0.0.1’}]]

    is a known issue: N Value – Preflist Message is Vague.

    From the issue, the message means: “these preflists have more than one replica on the same node.”

    Not surprising since I am running on one physical node and not in production.

    The Riak Fast Track has you create four nodes on one physical node as a development environment. So I’m going to ignore the “prelists” warning in this context.

    The last message:

    [notice] Data directory /var/lib/riak/bitcask is not mounted with ‘noatime’. Please remount its disk with the ‘noatime’ flag to improve performance.

    is resolved under “Mounts and Scheduler” in the Linux Performance Tuning document.

    I am going to make all the system changes, reboot and start on the The Riak Fast Track tomorrow.

    PS: In case you are wondering what this has to do with topic maps, ask yourself what characteristics you would want in a distributed topic map system?

    July 10, 2013

    Riak 1.4 Hits the Street!

    Filed under: Erlang,Riak — Patrick Durusau @ 4:23 pm

    Well, they actually said: Basho Announces Availability of Riak 1.4.

    From the post:

    We are excited to announce the launch of Riak 1.4. With this release, we have added in more functionality and addressed some common requests that we hear from customers. In addition, there are a few features available in technical preview that you can begin testing and will be fully rolled out in the 2.0 launch later this year.

    The new features and updates in Riak 1.4 include:

    • Secondary Indexing Improvements: Query results are now sorted and paginated, offering developers much richer semantics
    • Introducing Counters in Riak: Counters, Riak’s first distributed data type, provide automatic conflict resolution after a network partition
    • Simplified Cluster Management With Riak Control: New capabilities in Riak’s GUI-based administration tool improve the cluster management page for preparing and applying changes to the cluster
    • Reduced Object Storage Overhead: Values and associated metadata are stored and transmitted using a more compact format, reducing disk and network overhead
    • Hinted Handoff Progress Reporting: Makes operating the cluster, identifying and troubleshooting issues, and monitoring the cluster simpler
    • Improved Backpressure: Riak responds with an overload message if a vnode has too many messages in queue

    Plus performance and management enhancements for the enterprise crowd.

    Download Riak 1.4: http://docs.basho.com/riak/latest/downloads/

    Code at: Github.com/basho

    Live webcast: What’s New in Riak 1.4” on July 12th.

    That’s this coming Friday.

    May 2, 2013

    Data Structures in Riak

    Filed under: Data Structures,Riak — Patrick Durusau @ 2:20 pm

    Data Structures in Riak (NoSQL Matters Cologne 2013) by Sean Cribbs.

    From the description:

    Since the beginning, Riak has supported high write-availability using Dynamo-style multi-valued keys – also known as conflicts or siblings. The tradeoff for this type of availability is that the application must include logic to resolve conflicting updates. While it is convenient to say that the application can reason best about conflicts, ad hoc resolution is error-prone and can result in surprising anomalies, like the reappearing item problem in Dynamo’s shopping cart.

    What is needed is a more formal and general approach to the problem of conflict resolution for complex data structures. Luckily, there are some formal strategies in recent literature, including Conflict-Free Replicated Data Types (CRDTs) and BloomL lattices. We’ll review these strategies and cover some recent work we’ve done toward adding automatically-convergent data structures to Riak.

    OK, it’s a slide deck so you will have to supply the prose parts yourself.

    There are references to published literature with URLs so it may take a little work, but you will be better off for it. 😉

    November 2, 2012

    RICON 2012 [videos, slides, resources]

    Filed under: Distributed Systems,Erlang,Riak — Patrick Durusau @ 2:59 pm

    RICON 2012 [videos, slides, resources]

    From the webpage:

    Basho Technologies, along with our sponsors, proudly presented RICON 2012, a two day conference dedicated to Riak, developers, and the future of distributed systems in production. This page is dedicated to post-conference consumption. Here you will find slidedecks, resources, and much more.

    Videos for the weekend (for those of you without NetFlix accounts):

    • Joseph Blomstedt, Bringing Consistency to Riak
    • Sean Cribbs, Data Structures in Riak
    • Selena Deckelmann, Rapid Data Prototyping With Postgres
    • Dietrich Featherston, Modern Radiology for Distributed Systems
    • Gary Flake, Building a Social Application on Riak
    • Theo Schlossnagle, Next Generation Monitoring of Large Scale Riak Applications
    • Ines Sombra and Michael Brodhead, Riak in the Cloud
    • Andrew Thompson, Cloning the Cloud – Riak and Multi Data Center Replication

    It is hard to decide what to watch first.

    What do you think?

    September 27, 2012

    Searching and Accessing Data in Riak (overview slides)

    Filed under: MapReduce,Riak — Patrick Durusau @ 3:22 pm

    Searching and Accessing Data in Riak by Andy Gross and Shanley Kane.

    From the description:

    An overview of methods for searching and aggregating data in Riak, covering Riak Search, secondary indexes and MapReduce. Reviews use cases and features for each method, when to use which, and the limitations and advantages of each approach. In addition, it covers query examples and the high-level architecture of each method.

    If you are already familiar with search/access to data in Riak, you won’t find anything new here.

    It would be useful to have some topic map specific examples written using Riak.

    Sing out if you decide to pursue that train of thought.

    August 8, 2012

    Riak 1.2 Webinar – 21st August 2012

    Filed under: Erlang,Riak — Patrick Durusau @ 1:48 pm

    Riak 1.2 Webinar – 21st August 2012

    • 11:00 Pacific Daylight Time (San Francisco, GMT-07:00)
    • 14:00 Eastern Daylight Time (New York, GMT-04:00)
    • 20:00 Europe Summer Time (Berlin, GMT+02:00)

    From the registration page:

    Join Basho Technologies’ Engineer, Joseph Blomstedt, for an in-depth overview of Riak 1.2, the latest version of Basho’s flagship open source database. In this live webinar, you will see changes in Riak 1.2 open source and Enterprise versions, including:

    • New approach to cluster administration
    • Built-in capability negotiation
    • Repair Search or KV Partitions thru Riak Console
    • Enhanced Handoff Reporting
    • Protobuf API Support for 2i and Search indexes
    • New Packaging for FreeBSD, SmartOS, and Ubuntu
    • Stats Improvements
    • LevelDB Improvements

    I would have included this with the Riak 1.2 release post but was afraid you would not get past the download link and not see the webinar.

    It’s on my calendar. How about yours?

    Riak 1.2 Is Official!

    Filed under: Erlang,Riak — Patrick Durusau @ 1:46 pm

    Riak 1.2 Is Official!

    From the post:

    Nearly three years ago to the day, from a set of green, worn couches in a modest office Cambridge, Massachusetts, the Basho team announced Riak to the world. To say we’ve come a long way from that first release would be an understatement, and today we’re pleased to announce the release and general availability of Riak 1.2.

    Here’s the tl;dr on what’s new and improved since the Riak 1.1 release:

    • More efficiently add multiple Riak nodes to your cluster
    • Stage and review, then commit or abort cluster changes for easier operations; plus smoother handling of rolling upgrades
    • Better visibility into active handoffs
    • Repair Riak KV and Search partitions by attaching to the Riak Console and using a one-line command to recover from data corruption/loss
    • More performant stats for Riak; the addition of stats to Riak Search
    • 2i and Search usage thru the Protocol Buffers API
    • Official Support for Riak on FreeBSD
    • In Riak Enterprise: SSL encryption, better balancing and more granular control of replication across multiple data centers, NAT support

    If that’s all you need to know, download the new release or read the official release notes. Also, go register for RICON.

    OK, but I have a question: What happened to the lucky “…green, worn couches…”? 😉

    June 8, 2012

    Riak Handbook, Second Edition [$29 for 154 pages of content]

    Filed under: NoSQL,Riak — Patrick Durusau @ 8:57 pm

    Riak Handbook, Second Edition, by Mathias Meyer.

    From the post:

    Basho Technologies today announced the immediate availability of the second edition of Riak Handbook. The significantly updated Riak Handbook includes more than 43 pages of new content covering many of the latest feature enhancements to Riak, Basho’s industry-leading, open-source, distributed database. Riak Handbook is authored by former Basho developer and advocate, Mathias Meyer.

    Riak Handbook is a comprehensive, hands-on guide to Riak. The initial release of Riak Handbook focused on the driving forces behind Riak, including Amazon Dynamo, eventual consistency and CAP Theorem. Through a collection of examples and code, Mathias’ Riak Handbook explores the mechanics of Riak, such as storing and retrieving data, indexing, searching and querying data, and sheds a light on Riak in production. The updated handbook expands on previously covered key concepts and introduces new capabilities, including the following:

    • An overview of Riak Control, a new Web-based operations management tool
    • Full coverage on pre- and post-commit hooks, including JavaScript and Erlang examples
    • An entirely new section on deploying Erlang code in a Riak cluster
    • Additional details on secondary indexes
    • Insight into load balancing Riak nodes
    • An introduction to network node planning
    • An introduction to Riak CS, includes Amazon S3 API compatibility

    The updated Riak Handbook includes an entirely new section dedicated to popular use cases and is full of examples and code from real-time usage scenarios.

    Mathias Meyer is an experienced software developer, consultant and coach from Berlin, Germany. He has worked with database technology leaders such as Sybase and Oracle. He entered into the world of NoSQL in 2008 and joined Basho Technologies in 2010.

    I haven’t ordered a copy. The $29.00 for 154 odd pages of content seems a bit steep to me.

    May 16, 2012

    Progressive NoSQL Tutorials

    Filed under: Cassandra,Couchbase,CouchDB,MongoDB,Neo4j,NoSQL,RavenDB,Riak — Patrick Durusau @ 10:20 am

    Have you ever gotten an advertising email with clean links in it? I mean a link without all the marketing crap appended to the end. The stuff you have to clean off before using it in a post or sending it to a friend?

    Got my first one today. From Skills Matter on the free videos for their Progressive NoSQL Tutorials that just concluded.

    High quality presentations, videos freely available after presentation, friendly links in email, just a few of the reasons to support Skills Matter.

    The tutorials:

    March 22, 2012

    Milking Performance from Riak Search

    Filed under: Erlang,Riak — Patrick Durusau @ 7:42 pm

    Milking Performance from Riak Search by Gary William Flake.

    From the post:

    The primary backend store of Clipboard is built on top of Riak, one of the lesser known NoSQLs solutions. We love Riak and are really happy with our experiences with it — both in terms of development and operations — but to get to where we are, we had to use some tricks. In this post I want to share with you why we chose Riak and also arm you with some of the best tricks that we learned along the way. Individually, these tricks gave us better than a 100x performance boost, so they may make a big difference for you too.

    If you don’t know what Clipboard is, you should try it out. We’re in private beta now, but here’s a backdoor that will bypass the invitation system: Register at Clipboard.

    Good discussion of term-based partitioning and its disadvantages. (Term-based partitioning being native to Riak.) Solved in part by judging likely queries in advance and precomputing inner joins. Not a bad method, depending on your confidence in your guesses about likely queries.

    You will also have to determine if sorting on a primary key meets your needs, for a 10X to 100X performance gain.

    February 21, 2012

    Riak 1.1 + Webinar

    Filed under: NoSQL,Riak — Patrick Durusau @ 7:59 pm

    Riak 1.1 Release + Webinar

    This post almost didn’t happen. I got an email notice about this release and when I went to the web page “version,” every link pointed to the 29 February 2012 webinar on Riak 1.1. If the term was “webinar” or “Riak 1.1,” multiple times.

    So I go to the Basho website, this is big news. Nothing on the blog. There is an image on the homepage if you know which one to choose.

    Finally, I went to company -> news -> “Basho Unveils New Graphical Operations Dashboard, Diagnostics with Release of Riak 1.1.”

    OK, not the best headline but at least you know you have arrived at the right place.

    Tip: Don’t make news about your product or company hard to find. (KISS4S – Keep it simple stupid for stupids)

    After getting there I find:

    Riak 1.1 boosts data synchronization performance for multi-data center deployments, provides operating system and
    installation diagnostics and improves operational control for very large clusters. Riak 1.1 delivers a range of new
    features and improvements including:

    • Riak Control, a completely open source and intuitive administrative console for managing, monitoring and interfacing with Riak clusters
    • Riaknostic, an open source, proactive diagnostic suite for detecting common configuration and runtime problems
    • Enhanced error logging and reporting
    • Improved resiliency for large clusters
    • Automatic data compression using the Snappy compression library

    Additionally, Riak EDS (Enterprise Data Store), Basho’s commercial
    distribution based on Riak, features major enhancements, primarily for multi-data center replication:

    • Introduction of bucket-level replication, adding more granularity and robustness
    • Various distinct data center synchronization options are now available, each optimized for different use cases
    • Significant improvement of data synchronization across multiple data centers

    “The 1.1 release is focused on simplifying life for developers and administrators. Basho’s new Riak Control and
    Riaknostic components move Riak open source forward, providing an easy and intuitive way to diagnose, manage and
    monitor Riak platforms,” said Don Rippert, CEO, Basho. “While Riak Control was originally part of Basho’s commercial
    offering, we decided to release the code as part of Riak 1.1 to reinforce our commitment to the open source community.”

    The notice was worth hunting for and the release looks very interesting.

    As added incentive, you can get free Riak/Basho stickers. I think sew on patches would be good as well. Instead of biker jacket you could have a developer jacket. 😉

    February 3, 2012

    Vector Clocks – Easy/Hard?

    Filed under: Erlang,Riak — Patrick Durusau @ 4:55 pm

    The Basho blog has a couple of very good posts on vector clocks:

    Why Vector Clocks are Easy

    Why Vector Clocks are Hard

    The problem statement was as follows:

    Alice, Ben, Cathy, and Dave are planning to meet next week for dinner. The planning starts with Alice suggesting they meet on Wednesday. Later, Dave discuss alternatives with Cathy, and they decide on Thursday instead. Dave also exchanges email with Ben, and they decide on Tuesday. When Alice pings everyone again to find out whether they still agree with her Wednesday suggestion, she gets mixed messages: Cathy claims to have settled on Thursday with Dave, and Ben claims to have settled on Tuesday with Dave. Dave can’t be reached, and so no one is able to determine the order in which these communications happened, and so none of Alice, Ben, and Cathy know whether Tuesday or Thursday is the correct choice.

    Vector clocks are used to keep the order of communications clear. Something you will need in distributed systems, including those for topic maps.

    Building Distributed Systems with Riak Core

    Filed under: Distributed Systems,Riak — Patrick Durusau @ 4:54 pm

    Building Distributed Systems with Riak Core by Steve Vinoski (Basho).

    From the description:

    Riak Core is the distributed systems foundation for the Riak distributed database and the Riak Search full-text indexing system. Riak Core provides a proven architecture and key functionality required to quickly build scalable, distributed applications. This talk will cover the origins of Riak Core, the abstractions and functionality it provides, and some guidance on building distributed systems.

    Rest assured or be forewarned that there is no Erlang code in this presentation.

    For all that, it is still a very informative presentation on building scalable, distributed applications.

    January 4, 2012

    Riak NoSQL Database: Use Cases and Best Practices

    Filed under: NoSQL,Riak — Patrick Durusau @ 7:49 am

    Riak NoSQL Database: Use Cases and Best Practices

    From the post:

    Riak is a key-value based NoSQL database that can be used to store user session related data. Andy Gross from Basho Technologies recently spoke at QCon SF 2011 Conference about Riak use cases. InfoQ spoke with Andy and Mark Phillips (Community Manager) about Riak database features and best practices when using Riak.

    Not a lot of technical detail but enough to get a feel for whether you want/need to learn more about Riak.

    December 22, 2011

    Riaknostic diagnostic tools for Riak

    Filed under: Riak,TMCL — Patrick Durusau @ 7:38 pm

    Riaknostic diagnostic tools for Riak

    From the webpage:

    Overview

    Sometimes, things go wrong in Riak. How can you know what’s wrong? Riaknostic is here to help.

    (example omitted)

    Riaknostic, which is invoked via the above command, is a small suite of diagnostic checks that can be run against your Riak node to discover common problems and recommend how to resolve them. These checks are derived from the experience of the Basho Client Services Team as well as numerous public discussions on the mailing list, IRC room, and other online media.

    Two things occur to me:

    One, diagnostic checks are a good idea, particularly ones that can be extended by the community. Hopefully the error messages are more helpful than cryptic but I will have to try it out to find out.

    Two, what diagnostics would you write in TMCL as general diagnostics on a topic map? How would you discover what constraints to write as diagnostics in TMCL?

    December 1, 2011

    Seven Databases in Seven Weeks now in Beta

    Filed under: CouchDB,HBase,MongoDB,Neo4j,PostgreSQL,Redis,Riak — Patrick Durusau @ 7:41 pm

    Seven Databases in Seven Weeks now in Beta

    From the webpage:

    Redis, Neo4J, Couch, Mongo, HBase, Riak, and Postgres: with each database, you’ll tackle a real-world data problem that highlights the concepts and features that make it shine. You’ll explore the five data models employed by these databases: relational, key/value, columnar, document, and graph. See which kinds of problems are best suited to each, and when to use them.

    You’ll learn how MongoDB and CouchDB, both JavaScript powered, document oriented datastores, are strikingly different. Learn about the Dynamo heritage at the heart of Riak and Cassandra. Understand MapReduce and how to use it to solve Big Data problems.

    Build clusters of servers using scalable services like Amazon’s Elastic Compute Cloud (EC2). Discover the CAP theorem and its implications for your distributed data. Understand the tradeoffs between consistency and availability, and when you can use them to your advantage. Use multiple databases in concert to create a platform that’s more than the sum of its parts, or find one that meets all your needs at once.

    Seven Databases in Seven Weeks will give you a broad understanding of the databases, their strengths and weaknesses, and how to choose the ones that fit your needs.

    Now in beta, in non-DRM PDF, epub, and mobi from pragprog.com/book/rwdata.

    If you know the Seven Languages in Seven Weeks by Bruce Tate, no further recommendation is necessary for the approach.

    I haven’t read the book, yet, but will be getting the electronic beta tonight. More to follow.

    November 8, 2011

    Someone Is Being Honest on the Internet?

    Filed under: MongoDB,NoSQL,Riak — Patrick Durusau @ 7:44 pm

    After seeing the raft of Twitter traffic on MongoDB and Riak, In Context (and an apology), I just had to look. The thought of someone being honest on the Internet being even more novel than someone being wrong on the Internet.

    At least I would not have to stay up late correcting them. 😉

    Sean Cribbs writes:

    There has been quite a bit of furor and excitement on the Internet this week regarding some very public criticisms (and defenses) of MongoDB and its creators, 10gen. Unfortunately, a ghost from my recent past also resurfaced as a result. Let me begin by apologizing to 10gen and its engineers for what I said at JSConf, and then I will reframe my comments in a more constructive form.

    Mea culpa. It’s way too easy in our industry to set up and knock down strawmen, as I did, than to convey messages of objective and constructive criticism. It’s also too easy, when you are passionate about what you believe in, to ignore the feelings and efforts of others, which I did. I have great respect for the engineers I have met from 10gen, Mathias Stern and Kyle Banker. They are friendly, approachable, helpful and fun to socialize with at conferences. Thanks for being stand-up guys.

    Also, whether we like it or not, these kinds of public embarrassments have rippling effects across the whole NoSQL ecosystem. While Basho has tried to distance itself from other players in the NoSQL field, we cannot deny our origins, and the ecosystem as a “thing” is only about 3 years old. Are developers, technical managers and CTOs more wary of new database technologies as a result of these embarrassments? Probably. Should we continue to work hard to develop and promote alternative data-storage solutions? Absolutely.

    Sean’s following comments are useful but even more useful was his suggestion that both MongoDB and Riak push to improve their respective capabilities. There is always room for improvement.

    Oh, I did notice on thing that needs correcting in Sean’s blog entry. 😉 See: Munnecke, Heath Records and VistA (NoSQL 35 years old?) NoSQL is at least 35 years old, probably longer but I don’t have the citation at hand.

    November 3, 2011

    NoSQL Exchange – 2 November 2011

    NoSQL Exchange – 2 November 2011

    It doesn’t get much better or fresher (for non-attendees) than this!

    • Dr Jim Webber of Neo Technology starts the day by welcoming everyone to the first of many annual NOSQL eXchanges. View the podcast here…
    • Emil Eifrém gives a Keynote talk to the NOSQL eXchange on the past, present and future of NOSQL, and the state of NOSQL today. View the podcast here…
    • HANDLING CONFLICTS IN EVENTUALLY CONSISTENT SYSTEMS In this talk, Russell Brown examines how conflicting values are kept to a minimum in Riak and illustrates some techniques for automating semantic reconciliation. There will be practical examples from the Riak Java Client and other places.
    • MONGODB + SCALA: CASE CLASSES, DOCUMENTS AND SHARDS FOR A NEW DATA MODEL Brendan McAdams — creator of Casbah, a Scala toolkit for MongoDB — will give a talk on “MongoDB + Scala: Case Classes, Documents and Shards for a New Data Model”
    • REAL LIFE CASSANDRA Dave Gardner: In this talk for the NOSQL eXchange, Dave Gardner introduces why you would want to use Cassandra, and focuses on a real-life use case, explaining each Cassandra feature within this context.
    • DOCTOR WHO AND NEO4J Ian Robinson: Armed only with a data store packed full of geeky Doctor Who facts, by the end of this session we’ll have you tracking down pieces of memorabilia from a show that, like the graph theory behind Neo4j, is older than Codd’s relational model.
    • BUILDING REAL WORLD SOLUTION WITH DOCUMENT STORAGE, SCALA AND LIFT Aleksa Vukotic will look at how his company assessed and adopted CouchDB in order to rapidly and successfully deliver a next generation insurance platform using Scala and Lift.
    • ROBERT REES ON POLYGLOT PERSISTENCE Robert Rees: Based on his experiences of mixing CouchDB and Neo4J at Wazoku, an idea management startup, Robert talks about the theory of mixing your stores and the practical experience.
    • PARKBENCH DISCUSSION This Park Bench discussion will be chaired by Jim Webber.
    • THE FUTURE OF NOSQL AND BIG DATA STORAGE Tom Wilkie: Tom Wilkie takes a whistle-stop tour of developments in NOSQL and Big Data storage, comparing and contrasting new storage engines from Google (LevelDB), RethinkDB, Tokutek and Acunu (Castle).

    And yes, I made a separate blog post on Neo4j and Dr. Who. 😉 What can I say? I am a fan of both.

    September 22, 2011

    Riak 1.0.0 RC 1

    Filed under: Erlang,Riak — Patrick Durusau @ 6:27 pm

    Riak 1.0.0 RC 1

    From the post:

    We are pleased to announce the first release candidate for Riak 1.0.0 is now available.

    The packages are available on our downloads page: http://downloads.basho.com/riak/riak-1.0.0rc1/

    As a release candidate, we consider this to be a functionally complete representation of Riak 1.0.0. From now until the 1.0.0 release, only critical bug fixes will be merged into the repository. We would like to thank everyone who took the time to download, install, and run the pre-releases. The Riak community has always been one of the great strengths of Riak, and this release period has been no different with feedback and bug reports we’ve been given.

    Cool!

    September 14, 2011

    Secondary Indexes in Riak

    Filed under: Indexing,Riak — Patrick Durusau @ 7:02 pm

    Secondary Indexes in Riak

    Hey, “…alternate keys, one-to-many relationships, or many-to-many relationships…,” it sounds like they are playing the topic map song!

    From the post:

    Developers building an application on Riak typically have a love/hate relationship with Riak’s simple key/value-based approach to storing data. It’s great that anyone can grok the basics (3 simple operations, get/put/delete) quickly. It’s convenient that you can store anything imaginable as an object’s value: an integer, a blob of JSON data, an image, an MP3. And the distributed, scalable, failure-tolerant properties that a key/value storage model enables can be a lifesaver depending on your use case.

    But things get much less rosy when faced with the challenge of representing alternate keys, one-to-many relationships, or many-to-many relationships in Riak. Historically, Riak has shifted these responsibilities to the application developer. The developer is forced to either find a way to fit their data into a key/value model, or to adopt a polyglot storage strategy, maintaining data in one system and relationships in another.

    This adds complexity and technical risk, as the developer is burdened with writing additional bookkeeping code and/or learning and maintaining multiple systems.

    That’s why we’re so happy about Secondary Indexes. Secondary Indexes are the first step toward solving these challenges, lifting the burden from the backs of developers, and enabling more complex data modeling in Riak. And the best part is that it ships in our 1.0 release, just a few weeks from now.

    September 13, 2011

    Analyzing Apache Logs with Riak

    Filed under: Riak — Patrick Durusau @ 7:12 pm

    Analyzing Apache Logs with Riak

    From the post:

    This article will show you how to do some Apache log analysis using Riak and MapReduce. Specifically it will give an example of how to extract URLs from Apache logs stored in Riak (the map phase) and provide a count of how many times each URL was requested (the reduce phase).

    First steps with Riak and MapReduce.

    August 26, 2011

    Riak 1.0 Overview (webinar)

    Filed under: Riak — Patrick Durusau @ 6:26 pm

    Riak 1.0 Overview (webinar)

    Details:

    Webinar Date and Time

    Wednesday, September 21, 2011 at 2:00 pm, Eastern Daylight Time (New York, GMT-04:00)

    Wednesday, September 21, 2011 at 11:00 am, Pacific Daylight Time (San Francisco, GMT-07:00)

    Wednesday, September 21, 2011 at 8:00 pm, Europe Summer Time (Berlin, GMT+02:00)

    Join Basho Technologies’ CTO Justin Sheehy and Principal Architect Andy Gross for an in-depth overview of the key features and enhancements found in Riak 1.0, including:

    • Increased Query Capabilities
    • Usability Enhancements
    • Greater Reliability and Stability
    • Enhanced Scalability

    In addition to a sneak peek at Riak 1.0, attendees will also learn about what is in store for the Riak platform beyond this milestone 1.0 release, as well as discover additional services and products available from Basho Technologies, the creators of Riak.

    As always, attendees will have the chance to have their questions addressed by our Riak experts on hand. We hope you can join us as we review this landmark release.

    August 18, 2011

    Getting Started with Riak and .Net

    Filed under: NoSQL,Riak — Patrick Durusau @ 6:46 pm

    Getting Started with Riak and .Net by Adrian Hills.

    Short “getting started” guide. The installation was on Ubuntu and then he connects to the server with a .Net client.

    I wondered about the statement that Riak would not run on Windows (there are no pre-compiled binaries for Windows). Stackoverflow reports on Riak on Windows, several options to have Riak run on a Windows system. Compile under Windows, CYGwin, or run VMWARE or VirtualBox and run Riak inside the Linux VM.

    Older Posts »

    Powered by WordPress