Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 10, 2011

Riak Pipe (Beta)

Filed under: MapReduce,Riak — Patrick Durusau @ 7:14 pm

Riak Pipe (Beta)

While we are talking about MapReduce, may as well mention a Riak project, Pipe, that went out in beta in mid-June of this year.

From Bryan Fink’s announcement:

I’m excited to announce the opening of a new beta-status Basho project today: Riak Pipe.

http://github.com/basho/riak_pipe

Riak Pipe is a new way to distribute work around a Riak cluster.

The README explains much more than I can here, but essentially Riak Pipe allows you to specify work in the form of a chain of function pairs. One function of that pair describes how to produce output from input, and the other describes where in the cluster an input should be processed. Riak Pipe handles the details of ferrying data between workers by building atop Riak Core’s distribution power.

At this point in time Riak Pipe is BETA-status software. We’d like anyone who is interested in it to take a look and send us feedback. Please do not put it into production. We will be continuing to improve Riak Pipe toward a future release date.

We have two plans for Riak Pipe. The first is to power Riak’s MapReduce system with it. We think Riak Pipe provides a cleaner, more manageable subsystem that will provide much easier monitoring, debugging, and general use of MapReduce in Riak. You can see our work toward that goal in the “pipe” branch of Riak KV (start at src/riak_kv_mrc_pipe.erl):

https://github.com/basho/riak_kv/tree/pipe

Our second plan for Riak Pipe is to expand Riak’s MapReduce system with more abilities (imagine a keyed-reduce phase, or additional processing languages), possibly to the extent of providing an entirely separate interface (new query syntax? offline/asynchronous processing?). But for this part, we need your help.

We have some ideas about what external client interfaces might look like. We also have some ideas about what an external processing interface might look like. We’re still in the early phases of creating these, though, so if exploring the riak_pipe repository gives you ideas, please don’t hesitate to get in touch.

And, again, Riak Pipe is BETA software. Basho does not support running it in production at this time.

August 3, 2011

Consistency or Bust: Breaking a Riak Cluster

Filed under: NoSQL,Riak — Patrick Durusau @ 7:36 pm

Consistency or Bust: Breaking a Riak Cluster by Jeff Kirkell.

Not your usual slidedeck.

Has enough examples and working instructions for you to actually learn something separate from the presentation.

Perhaps the one time you will be glad someone broke the rule about not putting text for the audience to read on a slide.

July 31, 2011

Riak and Python – 2nd August 2011

Filed under: NoSQL,Riak — Patrick Durusau @ 7:49 pm

Riak and Python

A free webinar sponsored by basho.

Dates and times:

Tuesday, August 2, 2011 2:00 pm, Eastern Daylight Time (New York, GMT-04:00)
Tuesday, August 2, 2011 11:00 am, Pacific Daylight Time (San Francisco, GMT-07:00)
Tuesday, August 2, 2011 8:00 pm, Europe Summer Time (Berlin, GMT+02:00)

Includes:

Building and Deploying a Simple imgur.com Clone Using Riak, Luwak, and Riak Search

You know where to get Riak and Riak Search, Luwak: https://github.com/basho/luwak.

June 20, 2011

Vol. 15: Understanding Dynamo — with Andy Gross

Filed under: NoSQL,Riak — Patrick Durusau @ 3:31 pm

Vol. 15: Understanding Dynamo — with Andy Gross

From the webpage:

Basho’s VP of Engineering runs us through the tenets of Dynamo systems. From Consistent Hashing to Vector Clocks, Gossip, Hinted Handoffs and Read Repairs. (Recorded on October 13, 2010 in San Francisco, CA.)

You may want to compare the presentation of Andy Gross at Riak Core: Dynamo Building Blocks. Basically the same material but worded differently.

May 27, 2011

Riak Core: Dynamo Building Blocks

Filed under: NoSQL,Riak — Patrick Durusau @ 12:36 pm

Riak Core: Dynamo Building Blocks

Highly recommended!

Summary:

Andy Gross discusses the design philosophy behind Riak based on Amazon Dynamo – Gossip Protocol, Consistent Hashing, Vector clocks, Read Repair, etc. -, overviewing its main features and architecture.

Amazon’s Dynamo paper:

Dynamo: Amazon’s Highly Available Key-value Store (HTML)

Dynamo: Amazon’s Highly Available Key-value Store (PDF)

One of the more intriguing slide represented http/apps/dbs as a stack to show that while scaling of the http layer is well-known, scaling of apps is more difficult but still doable, the scaling of storage is the most expensive and difficult.

I mention that because scaling of databases I suspect has a lot in common with scaling of topic maps.

On the issue of consistency, the point was made that “expires” can be included in HTTP headers, which indicate a fact is good until some time. I wonder, could a topic have a “last merged” property? So that a user can choose the timeliness they need? So that “last merged” 7 days ago is public information, “last merged” 3 days ago is subscriber information and the most recent “last merged” is premium information.

For example, instead of trying to regulate insider trading, the SEC could create a topic map of stocks and sell insider trading information, suitably priced to keep its “insider” character, except that for enough money, anyone could play. The SEC portion of the subscription + selling price could be used to finance other enforcement activities.

This presentation plus the Amazon paper make nice weekend reading/viewing.

May 17, 2011

Riak Search Explained

Filed under: Erlang,Riak — Patrick Durusau @ 2:49 pm

Riak Search Explained

From Alex Popescu myNoSQL, pointer to an explanation of Riak search.

Covers:

  • Full-text search built on Riak Core
  • Easy to use (start, join, done)
  • Solr compatible interface (just mentioned)
  • Riak KV integration (bulk of the presentation)

Focus of the presentation is to integrate full-text search with Riak Core with another application.

Riak Search is a superset of Riak KV (only install one).

Riak search source code.

Eventually Consistent?

Filed under: Erlang,Riak — Patrick Durusau @ 2:48 pm

statebox, an eventually consistent data model for Erlang (and Riak)

From the post:

When you choose an eventually consistent data store you’re prioritizing availability and partition tolerance over consistency, but this doesn’t mean your application has to be inconsistent. What it does mean is that you have to move your conflict resolution from writes to reads. Riak does almost all of the hard work for you [2], but if it’s not acceptable to discard some writes then you will have to set allow_mult to true on your bucket(s) and handle siblings [3] from your application. In some cases, this might be trivial. For example, if you have a set and only support adding to that set, then a merge operation is just the union of those two sets.

statebox is my solution to this problem. It bundles the value with repeatable operations [4] and provides a means to automatically resolve conflicts. Usage of statebox feels much more declarative than imperative. Instead of modifying the values yourself, you provide statebox with a list of operations and it will apply them to create a new statebox. This is necessary because it may apply this operation again at a later time when resolving a conflict between siblings on read.

I like that, “move conflict resolution from writes to reads.”

Sounds like where ISO/IEC 13250 points out two or more topic links maybe merged, and/or applications may process and/or render them as if they have been merged. (5.2.1 Topic Link Architectural Form)

Which fits your topic maps use case better? Consistency (one representative per subject) on write or read?

May 12, 2011

Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison

Filed under: Cassandra,CouchDB,HBase,MongoDB,NoSQL,Redis,Riak — Patrick Durusau @ 7:56 am

Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison

Good thumb-nail comparison of the major features of all six (6) NoSQL databases by Kristóf Kovács.

Sorry to see that Neo4J didn’t make the comparison.

May 8, 2011

Nitrogen Web Framework

Filed under: Erlang,Riak — Patrick Durusau @ 6:14 pm

Nitrogen Web Framework

From the website:

Nitrogen Web Framework is the fastest way to develop interactive web applications in full-stack Erlang.

Whether you are working with Riak (also programmed in Erlang) or not, this web framework may be of interest.

April 8, 2011

Riak Core – An Erlang Distributed Systems Toolkit

Filed under: Erlang,Riak — Patrick Durusau @ 7:20 pm

Riak Core – An Erlang Distributed Systems Toolkit

Abstract:

Riak Core is the distributed systems foundation for the Riak distributed database and the Riak Search full-text indexing system. Riak Core provides a proven architecture for building scalable, distributed applications quickly. This talk will cover the origins of Riak Core, the abstractions and functionality it provides, and some guidance on building distributed systems.

Something for those interested in building distributed topic map applications.

March 1, 2011

NoSQL Databases: Why, what and when

NoSQL Databases: Why, what and when by Lorenzo Alberton.

When I posted RDBMS in the Social Networks Age I did not anticipate returning the very next day with another slide deck from Lorenzo. But, after viewing this slide deck, I just had to post it.

It is a very good overview of NoSQL databases and their underlying principles, with useful graphics as well (as opposed to the other kind).

I am going to have to study his graphic technique in hopes of applying it to the semantic issues that are at the core of topic maps.

February 26, 2011

Baseball Stats vs. riak map/reduce

Filed under: MapReduce,Riak — Patrick Durusau @ 2:38 pm

Baseball Stats vs. riak map/reduce

I saw this at Alex Popescu’s myNoSQL site, with the name: MapReducing Big Data with Riak and Luwak but since the baseball season is already in the news in Atlanta, I thought the other title works better.

Bryan Fink makes effective use of 30 minutes and baseball stats from RETROSHEET to demonstrate the use of riak map/reduce and how it might be applied to other data sets.

Well worth the time.

February 20, 2011

Riak Search

Filed under: NoSQL,Riak — Patrick Durusau @ 11:05 am

Riak Search

From the website:

Riak Search is a distributed, easily-scalable, failure-tolerant, real-time, full-text search engine built around Riak Core and tightly integrated with Riak KV.

Riak Search allows you to find and retrieve your Riak objects using the objects’ values. When a Riak KV bucket has been enabled for Search integration (by installing the Search pre-commit hook), any objects stored in that bucket are also indexed seamlessly in Riak Search.

The Riak Client API can then be used to perform Search queries that return a list of bucket/key pairs matching the query. Alternatively, the query results can be used as the input to a Riak map/reduce operation. Currently the PHP, Python, Ruby, and Erlang APIs support integration with Riak Search.

The indexing of XML data (it takes path/element name as key) is plausible enough. Made me wonder about a slightly different operation.

What if as part of the indexing operation, additional properties were added to the key?

Could be as simple as the DTD/Schema that defines the element or more complex information about the field.

January 31, 2011

Introduction to Riak Video with Rusty Klophaus – Post

Filed under: NoSQL,Riak — Patrick Durusau @ 1:58 pm

Introduction to Riak Video with Rusty Klophaus from MyNoSQL by Alex Popescu. Viewable online or downloadable in a couple of formats.

Starts with the observation that there are 47 different NoSQL projects. Doesn’t list them. 😉

I would watch this at the PivotLabs link because the related talks.

Oh, Riak homepage.

While I like the video, it is also an example that you don’t need high end video production or editing to produce useful video of presentations.

I mention as an answer to conferences that protest they need expensive equipment to video presentations.

That is simply not the case and anyone who says otherwise, to be generous, is mis-informed.

Tutorial: Developing in Erlang with Webmachine, ErlyDTL, and Riak

Filed under: Erlang,NoSQL,Riak — Patrick Durusau @ 7:08 am

Tutorial: Developing in Erlang with Webmachine, ErlyDTL, and Riak

From Alex Popescu’s MyNoSQL blog:

  • Part 1
    • In Part 1 of the series we covered the basics of getting the development environment up and running. We also looked at how to get a really simple ErlyDTL template rendering
  • Part 2
    • There are a few reasons this series is targeting this technology stack. One of them is uptime. We’re aiming to build a site that stays up as much as possible. Given that, one of the things that I missed in the previous post was setting up a load balancer. Hence this post will attempt to fill that gap.
  • Part 3 In this post we’re going to cover:
    • A slight refactor of code structure to support the “standard” approach to building applications in Erlang using OTP.
    • Building a small set of modules to talk to Riak.
    • Creation of some JSON helper functions for reading and writing data.
    • Calling all the way from the Webmachine front-end to Riak to extract data and display it in a browser using ErlyDTL templates.

Erlang is important for anyone building high availability (think telecommunications) systems that can be dynamically reconfigured without taking the systems offline.

December 31, 2010

Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison – Post

Filed under: Cassandra,CouchDB,HBase,NoSQL,Redis,Riak — Patrick Durusau @ 11:01 am

Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison

Not enough detail for decision making but a useful overview nonetheless.

December 9, 2010

Schema Design for Raik (Take 2)

Filed under: NoSQL,Riak,Schema — Patrick Durusau @ 5:48 pm

Schema Design for Riak (Take 2)

Useful exercise in schema design in a NoSQL context.

No great surprise that focus on data and application requirements are the keys (sorry) to a successful deployment.

Amazing how often that gets repeated, at least in presentations.

Equally amazing how often that gets ignored in implementations (at least to judge from how often it is repeated in presentations).

Still, we all need reminders so it is worth the time to review the slides.

Basho Riak: An Open Source Scalable Data Store

Filed under: MapReduce,NoSQL,Riak — Patrick Durusau @ 5:45 pm

Basho Riak: An Open Source Scalable Data Store

From the website:

Riak is a Dynamo-inspired key/value store that scales predictably and easily. Riak also simplifies development by giving developers the ability to quickly prototype, test, and deploy their applications

A truly fault-tolerant system, Riak has no single point of failure. No machines are special or central in Riak, so developers and operations professionals can decide exactly how fault-tolerant they want and need their applications to be.

The video from Ga Tech NoSQL conference in 2009 is worth watching.

Their implementation of MapReduce: is targeted (doesn’t have to be run against entire data set), can be setup as a stream (store and send through mapreduce), or used with the representation of relationships as links.

« Newer Posts

Powered by WordPress