Archive for the ‘Cassandra’ Category

Cassandra 1.0.5

Friday, December 2nd, 2011

Cassandra 1.0.5

A reversion release of Cassandra. Details: Cassandra changes.

Looks like the holidays are going to be filled with upgrades, new releases!

Are You a Cassandra Jedi?

Friday, November 11th, 2011

Are You a Cassandra Jedi?

Cassandra Conference, December 6, 2011, New York City

From the call for speakers:

BURLINGAME, Calif. – November 9, 2011 –DataStax, the commercial leader in Apache Cassandra™, along with the NYC Cassandra User Group, NoSQL NYC, and Big Data NYC are joining together to present the first Cassandra New York City conference on December 6. This all day, two-track event will focus on enterprise use cases as well as the latest developments in Cassandra. Early bird registration is now open here.

Coming on the heels of a sold-out DataStax Cassandra SF earlier this year, the event will feature some of the most interesting Cassandra use-cases from up and down the Eastern Seaboard. Cassandra NYC will be keynoted by Jonathan Ellis, chairman of the Apache Cassandra project, who will highlight what’s new in Cassandra 1.0, and what’s in store for the future. Additional confirmed speakers include Nathan Marz, lead engineer for the Storm project at Twitter and Jim Ancona, systems architect at Constant Contact.

“With the recent 1.0 release, we are seeing users doing amazing new things with Cassandra that are going beyond even our expectations and imagination,” said Ellis. “We look forward to sharing these stories with the broader community, to further hasten the adoption and usage of Cassandra to meet their real-time, big data challenges.”

Call for Speakers and Press Registration

The call for speakers is now also open for the event. Submissions can be made to lynnbender@datastax.com.

Press interested in attending the event may contact Zenobia@intersectcom.com for a complimentary press pass.

The event will be held at the Lighthouse International Conference Center on 59th St.

I am not sure about “early bird” registration for an event less than a month away but this sounds quite interesting. I hope the presentations will be recorded and posted for asynchronous access.

DataStax Enterprise and DataStax Community Edition

Friday, November 11th, 2011

DataStax Enterprise and DataStax Community Edition

From the announcement:

BURLINGAME, Calif. – Nov.1, 2011 –DataStax, the commercial leader in Apache Cassandra™, today announced that DataStax Enterprise, the industry’s first distributed, scalable, and highly available database platform powered by Apache Cassandra™ 1.0, is now available.

“The ability to manage both real-time and analytic data in a simple, massively scalable, integrated solution is at the heart of challenges faced by most businesses with legacy databases,” said Billy Bosworth, CEO, DataStax. “Our goal is to ensure businesses can conquer these challenges with a modern application solution that provides operational simplicity, optimal performance and incredible cost savings.”

“Apache Cassandra is the scalable, high-impact, comprehensive data platform that is well-suited to the rapidly-growing real-time data needs of our social media platform,” said Christian Carollo, Senior Manager, Mobile for GameFly. “We leveraged the expertise of DataStax to deploy our new social media platform, and were able to complete the project without worrying about scale or distribution – we simply built a great application and Apache Cassandra took care of the rest.”

BTW, DataStax just added its 100th customer. You might recognize some of them, Netflix, Cisco, etc.

CumulusRDF

Monday, November 7th, 2011

CumulusRDF

From Andreas Harth and Günter Ladwig:

[W]e are happy to announce the first public release of CumulusRDF, a Linked Data server that uses Apache Cassandra [1] as a cloud-based storage backend. CumulusRDF provides a simple HTTP interface [2] to manage RDF data stored in an Apache Cassandra cluster.

Features
* By way of Apache Cassandra, CumulusRDF provides distributed, fault-tolerant and elastic RDF storage
* Supports Linked Data and triple pattern lookups
* Proxy mode: CumulusRDF can act as a proxy server [3] for other Linked Data applications, allowing to deploy any RDF dataset as Linked Data

This is a first beta release that is still somewhat rough around the edges, but the basic functionality works well. The HTTP interface is work-in-progress. Eventually, we plan to extend the storage model to support quads.

CumulusRDF is available from http://code.google.com/p/cumulusrdf/

See http://code.google.com/p/cumulusrdf/wiki/GettingStarted to get started using CumulusRDF.

There is also a paper [4] on CumulusRDF that I presented at the Scalable Semantic Knowledge Base Systems (SSWS) workshop at ISWC last week.

Cheers,
Andreas Harth and Günter Ladwig

[1] http://cassandra.apache.org/
[2] http://code.google.com/p/cumulusrdf/wiki/HttpInterface
[3] http://code.google.com/p/cumulusrdf/wiki/ProxyMode
[4] http://people.aifb.kit.edu/gla/cumulusrdf/cumulusrdf-ssws2011.pdf

Everybody knows I hate to be picky but the abstract of [4] promises:

Results on a cluster of up to 8 machines indicate that CumulusRDF is competitive to state-of-the-art distributed RDF stores.

But I didn’t see any comparison to “state-of-the-art” RDF stores, distributed or not. Did I just overlook something?

I ask because I think this approach has promise, at least as an exploration of indexing strategies for RDF and how usage scenarios may influence those strategies. But that will be difficult to evaluate in the absence of comparison to less imaginative approaches to RDF indexing.

NoSQL Exchange – 2 November 2011

Thursday, November 3rd, 2011

NoSQL Exchange – 2 November 2011

It doesn’t get much better or fresher (for non-attendees) than this!

  • Dr Jim Webber of Neo Technology starts the day by welcoming everyone to the first of many annual NOSQL eXchanges. View the podcast here…
  • Emil Eifrém gives a Keynote talk to the NOSQL eXchange on the past, present and future of NOSQL, and the state of NOSQL today. View the podcast here…
  • HANDLING CONFLICTS IN EVENTUALLY CONSISTENT SYSTEMS In this talk, Russell Brown examines how conflicting values are kept to a minimum in Riak and illustrates some techniques for automating semantic reconciliation. There will be practical examples from the Riak Java Client and other places.
  • MONGODB + SCALA: CASE CLASSES, DOCUMENTS AND SHARDS FOR A NEW DATA MODEL Brendan McAdams — creator of Casbah, a Scala toolkit for MongoDB — will give a talk on “MongoDB + Scala: Case Classes, Documents and Shards for a New Data Model”
  • REAL LIFE CASSANDRA Dave Gardner: In this talk for the NOSQL eXchange, Dave Gardner introduces why you would want to use Cassandra, and focuses on a real-life use case, explaining each Cassandra feature within this context.
  • DOCTOR WHO AND NEO4J Ian Robinson: Armed only with a data store packed full of geeky Doctor Who facts, by the end of this session we’ll have you tracking down pieces of memorabilia from a show that, like the graph theory behind Neo4j, is older than Codd’s relational model.
  • BUILDING REAL WORLD SOLUTION WITH DOCUMENT STORAGE, SCALA AND LIFT Aleksa Vukotic will look at how his company assessed and adopted CouchDB in order to rapidly and successfully deliver a next generation insurance platform using Scala and Lift.
  • ROBERT REES ON POLYGLOT PERSISTENCE Robert Rees: Based on his experiences of mixing CouchDB and Neo4J at Wazoku, an idea management startup, Robert talks about the theory of mixing your stores and the practical experience.
  • PARKBENCH DISCUSSION This Park Bench discussion will be chaired by Jim Webber.
  • THE FUTURE OF NOSQL AND BIG DATA STORAGE Tom Wilkie: Tom Wilkie takes a whistle-stop tour of developments in NOSQL and Big Data storage, comparing and contrasting new storage engines from Google (LevelDB), RethinkDB, Tokutek and Acunu (Castle).

And yes, I made a separate blog post on Neo4j and Dr. Who. ;-) What can I say? I am a fan of both.

Usergrid Source Code Release on GitHub

Friday, October 7th, 2011

Usergrid Source Code Release on GitHub

From the webpage:

We’re announcing today the first source code release of Usergrid, a comprehensive platform stack for mobile and rich client applications. The entire codebase is now available on GitHub at https://github.com/usergrid/stack. Usergrid is built in Java and runs on top of Cassandra. Although we built Usergrid as a highly scalable cloud service, we’ve also taken a few steps to make it easy to run “small”, including providing a double-clickable desktop app that lets you run your own personal installation on your desktop, so you can get started right away.

I thought I read about “rich clients” with HTML5.

But the W3C web design team buried the HTML 5 draft 5 clicks deep from their homepage. Good thing I knew to keep looking. That’s not just poor marketing, that’s also poor design.

A future of incompatiblity awaits.

Cassandra Write Performance – A quick look inside

Wednesday, September 21st, 2011

Cassandra Write Performance – A quick look inside

From the post:

I was looking at Cassandra, one of the major NoSQL solutions, and I was immediately impressed with its write speed even on my notebook. But I also noticed that it was very volatile in its response time, so I took a deeper look at it.

Michael Kopp uses dynaTrace to look inside Cassandra. Lots of information in between and hopefully his conclusion will make you read this posts and those he promises to follow.

Conclusion

NoSQL or BigData Solutions are very very different from your usual RDBMS, but they are still bound by the usual constraints: CPU, I/O and most importantly how it is used! Although Cassandra is lighting fast and mostly I/O bound it’s still Java and you have the usual problems – e.g. GC needs to be watched. Cassandra provides a lot of monitoring metrics that I didn’t explain here, but seeing the flow end-to-end really helps to understand whether the time is spent on the client, network or server and makes the runtime dynamics of Cassandra much clearer.

Understanding is really the key for effective usage of NoSQL solutions as we shall see in my next blogs. New problem patterns emerge and they cannot be solved by simply adding an index here or there. It really requires you to understand the usage pattern from the application point of view. The good news is that these new solutions allow us a really deep look into their inner workings, at least if you have the right tools at hand.

What tools are you using to “look inside” your topic map engine?

What’s new in Cassandra 1.0: Compression

Wednesday, September 21st, 2011

What’s new in Cassandra 1.0: Compression

From the post:

Cassandra 1.0 introduces support for data compression on a per-ColumnFamily basis, one of the most-requested features since the project started. Compression maximizes the storage capacity of your Cassandra nodes by reducing the volume of data on disk. In addition to the space-saving benefits, compression also reduces disk I/O, particularly for read-dominated workloads.

OK, maybe someone can help me here.

Cassandra, an Apache project, just released version 8.6. Here are the release notes for 8.6.

As a standards editor I understand being optimistic about what is “…going to appear…” in a future release but isn’t version 0.8.6 a little early to be treating features for 1.0 a bit early? (I don’t find “compression” mentioned in the cumulative release notes as of 0.8.6.)

May just be me.

Planet Cassandra

Sunday, August 14th, 2011

Planet Cassandra

Aggregation of feeds on Cassandra. If you need to follow Cassandra closely, this would be among your first stops.

NoSQL standouts: New databases for new applications

Friday, August 12th, 2011

NoSQL standouts: New databases for new applications: Cassandra, CouchDB, MongoDB, Redis, Riak, Neo4J, and FlockDB reinvent the data store.

From the post:

Was it just two or three years ago when choosing a database was easy? Those with a Cadillac budget bought Oracle, those in a Microsoft shop installed SQL Server, those with no budget chose MySQL. Everyone in between tried to figure out where they belonged.

Those days are gone forever. Everyone and his brother are coming out with their own open source project for storing information. In most cases, these projects are tossing aside many of the belts-and-suspenders protections that people expect from the classic databases. There are enough of them now that some joker started calling them NoSQL and claiming, perhaps tongue-in-cheek, that the acronym stood for Not Only SQL.

I remember reading somewhere that the #1 reason for firing sysadmins was failure to maintain proper backups. A RDBMS system isn’t a magic answer to data security and anyone who thinks so, is probably a former sysadmin at one or more locations. ;-)

You need to read Jim Grey’s Transaction Processing: Concepts and Techniques if you want to design reliable systems. Or that is at least one of the works you need to read.

Do use the “print” option so you can read the article while avoiding most of the annoying distractions typical for this type of site.

Not detailed enough to be particularly useful. Actually I haven’t seen a comparison yet that was detailed enough to be really useful. I suppose in part because the approaches are different, would be hard compare apples with apples.

What might be useful would be to compare the use cases where each system claims to excel. Now that might be a continuum of interest to readers.

What do you think?

Cassandra: Introduction for System Administrators

Thursday, August 11th, 2011

Cassandra: Introduction for System Administrators by Nathan Milford.

Introductory slide deck for administrators interested in Cassandra (or being asked to participate in its use).

Pig with Cassandra: Adventures in Analytics

Monday, August 1st, 2011

Pig with Cassandra: Adventures in Analytics

Suggestions for slide 6 that reads in part:

Pygmalion

Figure in Greek Mythology, sounds like Pig

True enough but in terms of a control language, the play Pygmalion by Shaw would have been the better reference.

I presume the reader/listener would get the sound similarity without prompting.

Sorry, read the slide deck and see the source code at: https://github.com/jeromatron/pygmalion/.

Indexing in Cassandra

Thursday, July 28th, 2011

Indexing in Cassandra by Ed Anuff.

As if you haven’t noticed by now, I have a real weakness for indexing and indexing related material.

Interesting coverage of composite indexes.

NoSQL @ Netflix, Part 2

Wednesday, July 27th, 2011

NoSQL @ Netflix, Part 2 by Sid Anand.

OSCON 2011 presentation.

I think the RDBMS Concepts to Key-Value Store Concepts was the best part of the slide deck.

What do you think?

Cassandra SF 2011

Wednesday, July 20th, 2011

Cassandra SF 2011

Slides with videos to follow!

From the website:

Keynote Presentation

  • Jonathan Ellis (DataStax)State of Cassandra, 2011 (Slides)

Cassandra Internals

  • Ed AnuffIndexing in Cassandra (Slides)
  • Gary Dusbabek (RackSpace)Cassandra Internals (Slides)
  • Sylvain Lesbresne (DataStax) Counters in Cassandra (Slides)

High-Level Cassandra Development

  • Eric Evans (Rackspace)CQL – Not just NoSQL, It’s MoSQL (Slides)
  • Jake Luciani (DataStax) Scaling Solr with Cassandra (Slides)

Lightning Talks

  • Ben Coverston (DataStax)Redesigned Compaction LevelDB (Slides)
  • Joaquin Casares (DataStax)The Auto-Clustering Brisk AMI (Slides)
  • Matt Dennis (DataStax)Cassandra Anti-Patterns (Slides)
  • Mike Bulman (DataStax)OpsCenter: Cluster Management Doesn’t Have To Be Hard (Slides)
  • Stu Hood (Twitter)Prometheus’ Patch: #674 and You (Slides)

Practical Development

  • Jeremy Hanna (Dachis)Using Pig alongside Cassandra (Slides)
  • Matt Dennis (DataStax)Data Modeling Workshop (Slides)
  • Nate McCall (DataStax)Cassandra for Java Developers (Slides)
  • Yewei Zhang (DataStax)Hive Over Brisk (Slides)

Products

  • Jake Luciani (DataStax) Introduction to Brisk (Slides)
  • Kyle Roche (Isidorey) Cloudsandra: Multi-tenant Platform Build on Brisk (Slides)

Use Cases

  • Adrian Cockcroft (Netflix)Migrating Netflix from DataCenter Oracle to Global Cassandra (Slides)
  • Chris Goffinet (Twitter)Cassandra at Twitter (Slides)
  • David Strauss (Pantheon)Highly Available DNS and Request Routing Using Apache Cassandra (Slides)
  • Edward Capriolo (media6degrees)Real World Capacity Planning: Cassandra on Blades and Big Iron (Slides)
  • Eric Onnen (Urban Airship)From 100s to 100′s of Millions (Slides)

Indexing in Cassandra

Saturday, July 9th, 2011

Indexing in Cassandra

From the post:

I’m writing this up because there’s always quite a bit of discussion on both the Cassandra and Hector mailing lists about indexes and the best ways to use them. I’d written a previous post about Secondary indexes in Cassandra last July, but there are a few more options and considerations today. I’m going to do a quick run through of the different approaches for doing indexes in Cassandra so that you can more easily navigate these and determine what’s the best approach for your application.

Good article on indexes in Cassandra.

Big Data Genomics – How to efficiently store and retrieve mutation

Tuesday, June 28th, 2011

Big Data Genomics – How to efficiently store and retrieve mutation data by David Suvee.

About the post:

This blog post is the first one in a series of articles that describe the use of NoSQL databases to efficiently store and retrieve mutation data. Part one introduces the notion of mutation data and describes the conceptual use of the Cassandra NoSQL datastore.

From the post:

The only way to learn a new technology is by putting it into practice. Just try to find a suitable use case in your immediate working environment and give it go. In my case, it was trying to efficiently store and retrieve mutation data through a variety of NoSQL data stores, including Cassandra, MongoDB and Neo4J.

Promises to be an interesting series of posts that focus on a common data set and problem!

Near Bare Metal – Acunu

Wednesday, May 25th, 2011

Acunu Storage Platform

From the webpage:

The Acunu Storage Platform is a powerful storage solution that brings simpler, faster and more predictable performance to NOSQL stores like Apache Cassandra.

Our view is that the new data intensive workloads that are increasingly common are a poor match for the legacy storage systems they tend to run on. These systems are built on a set of assumptions about the capacity and performance of hardware that are simply no longer true. The Acunu Storage Platform is the result of a radical re-think of those assumptions; the result is high performance from low cost commodity hardware.

It includes the Acunu Storage Core which runs in the Linux kernel. On top of this core, we provide a modified version of Apache Cassandra. This is essentially the same as “vanilla” Cassandra but uses the Acunu Storage Core to store data instead of the Linux file system and is therefore able to take advantage of the performance benefits of our platform. In addition to Cassandra, there is also an object store similar to Amazon’s S3; we have a number of other more experimental projects in the pipeline which we’ll talk about in future posts.

Perhaps the start of something very interesting.

It took NoSQL a couple of years to flower into the range of current offerings.

I wonder if working in the kernel will have a similar path?

Will we see a graph engine as part of the kernel?

Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison

Thursday, May 12th, 2011

Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison

Good thumb-nail comparison of the major features of all six (6) NoSQL databases by Kristóf Kovács.

Sorry to see that Neo4J didn’t make the comparison.

Brisk: Simpler, More Reliable, High-Performance Hadoop Solution

Tuesday, May 10th, 2011

DataStax Releases Dramatically Simpler, More Reliable, High-Performance Hadoop Solution

From NoSQLDatabases coverage of Brisk a second generation Hadoop soltuion from Datastax.

From the post:

Today, DataStax, the commercial leader in Apache Cassandra™, released DataStax’ Brisk – a second-generation open-source Hadoop distribution that eliminates the key operational complexities with deploying and running Hadoop and Hive in production. Brisk is powered by Cassandra and offers a single platform containing a low-latency database for extremely high-volume web and real-time applications, while providing tightly coupled Hadoop and Hive analytics.

Download Brisk -> Here.

Cassandra – New Beta

Saturday, May 7th, 2011

Cassandra – New Beta

Version 0.8.0 beta2 has been posted!

Changes.

Installing and using Apache Cassandra With Java Part 1 (Installation)

Sunday, May 1st, 2011

Installing and using Apache Cassandra With Java Part 1 (Installation)

This series starts here and goes for five (5) parts for Cassandra 0.6.4.

From the introduction:

I’m going to write a few postings on how to use the Cassandra database with Java, although i am in no way an expert on how to use Cassandra i am very intrigued about the database because of it’s small installation, high performance and scalability. During the writing of these posts i am also learning the Cassandra database and i’m sharing my experiences with it through my posts on this blog.

Like i said before, Cassandra is a very high performing and scalable database, it doesn’t follow the normal SQL database principles like schema’s, tables / columns, datatypes and a query language like SQL. Instead it’s a non-relational database similar to Google’s BigTable. Cassandra was initially developed by Facebook which has contributed it to the open source community. Currently it is used by websites like Facebook, Twitter, Digg, Rackspace and many others. So even though it is still only version 0.6 at the time of writing this it has already proven itself in production environments.

It isn’t possible to say which (if any) of the NoSQL databases will prove to be the best fits for topic maps in particular or general situations.

What is clear is that a lot of experimentation and development is underway and hopefully the results will be interesting.

Cassandra – London Podcasts

Thursday, March 17th, 2011

Cassandra – London Podcasts

Podcasts from the London Cassandra User Group.

Cassandra – Thrift Application Jools Enticknap: 21 February 2011

Cassandra in TWEETMEME Nick Telford: 21 February 2011

Cassandra Meetup January 17th Jan 2011

Cassandra London Meetup Jake Luciani : 8th Dec 2010

Expiring columns

Tuesday, March 15th, 2011

Expiring columns

In Cassandra 0.7, there are expiring columns.

From the blog:

Sometimes, data comes with an expiration date, either by its nature or because it’s simply intractable to keep all of a rapidly growing dataset indefinitely.

In most databases, the only way to deal with such expiring data is to write a job running periodically to delete what is expired. Unfortunately, this is usually both error-prone and inefficient: not only do you have to issue a high volume of deletions, but you often also have to scan through lots of data to find what is expired.

Fortunately, Cassandra 0.7 has a better solution: expiring columns. Whenever you insert a column, you can specify an optional TTL (time to live) for that column. When you do, the column will expire after the requested amount of time and be deleted auto-magically (though asynchronously — see below). Importantly, this was designed to be as low-overhead as possible.

Now there is an interesting idea!

Goes along with the idea that a topic map does not (should not?) present a timeless view of information. That is a topic map should maintain state so that we can determine what was known at any particular time.

Take a simple example, a call for papers for a conference. It could be that a group of conferences all share the same call for papers, the form, submission guidelines, etc. And that call for papers is associated with each conference by an association.

Shouldn’t we be able to set an expiration date on that association so that at some point in time, all those facilities are no longer available for that conference? Perhaps it switches over to another set of properties in the same association to note that the submission dates have passed? That would remove the necessity for the association expiring.

But there are cases where associations do expire or at least end. Divorce in an unhappy example. Being hired is a happier one.

Something to think about.

agamemnon

Friday, March 11th, 2011

agamemnon

From the website:

Agamemnon is a thin library built on top of pycassa. It allows you to use the Cassandra database (http://cassandra.apache.org) as a graph database. Much of the api was inspired by the excellent neo4j.py project (http://components.neo4j.org/neo4j.py/snapshot/)

Thanks to Jack Park for pointing this out!

Cassandra Data Model – Semantic Impedance

Saturday, March 5th, 2011

WTF is a SuperColumn? An Intro to the Cassandra Data Model

A bit dated now but I thought some readers might find it useful.

From the posting:

If you’re coming from an RDBMS background (which is almost everyone) you’ll probably trip over some of the naming conventions while learning about Cassandra’s data model. It took me and my team members at Digg a couple days of talking things out before we “got it”. In recent weeks a bikeshed went down in the dev mailing list proposing a completely new naming scheme to alleviate some of the confusion. Throughout this discussion I kept thinking: “maybe if there were some decent examples out there people wouldn’t get so confused by the naming.” So, this is my stab at explaining Cassandra’s data model; It’s intended to help you get your feet wet & doesn’t go into every single detail but, hopefully, it helps clarify a few things.

Seems like I have heard about grouping sets of key/value pairs before but I will have to look for it. ;-)

More seriously, the current wave of data sets only aggravates the known semantic impedance problem.

A wave of data sets that promises to only increase.

So semantic impedance is going to increase.

Semantic impedance can be:

  • ignored – most current stove-piped information systems
  • save-the-world semantic solutions – poor adoption rates
  • broken by self-interested mapping that is reusable – the topic maps solution

ApacheCon NA 2011

Friday, March 4th, 2011

ApacheCon NA 2011

Proposals: Be sure to submit your proposal no later than Friday, 29 April 2011 at midnight Pacific Time.

7-11 November 2011 Vancouver

From the website:

This year’s conference theme is “Open Source Enterprise Solutions, Cloud Computing, and Community Leadership”, featuring dozens of highly-relevant technical, business, and community-focused sessions aimed at beginner, intermediate, and expert audiences that demonstrate specific professional problems and real-world solutions that focus on “Apache and …”:

  • … Enterprise Solutions (from ActiveMQ to Axis2 to ServiceMix, OFBiz to Chemistry, the gang’s all here!)
  • … Cloud Computing (Hadoop, Cassandra, HBase, CouchDB, and friends)
  • … Emerging Technologies + Innovation (Incubating projects such as Libcloud, Stonehenge, and Wookie)
  • … Community Leadership (mentoring and meritocracy, GSoC and related initiatives)
  • … Data Handling, Search + Analytics (Lucene, Solr, Mahout, OODT, Hive and friends)
  • … Pervasive Computing (Felix/OSGi, Tomcat, MyFaces Trinidad, and friends)
  • … Servers, Infrastructure + Tools (HTTP Server, SpamAssassin, Geronimo, Sling, Wicket and friends)

Real-Time Log Processing System based on Flume and Cassandra – Post

Thursday, March 3rd, 2011

Real-Time Log Processing System based on Flume and Cassandra

Very cool!

What would be even cooler, would be to have real-time associations with subjects that have information from outside the data set.

Or better yet, real-time on-demand associations with subjects that have information from outside the data set.

I suppose the classic use case would be running stats on all the sports events on a Saturday or Sunday, including individuals stats and merging in the latest doping, paternity and similar tests.

Other applications?

NoSQL Databases: Why, what and when

Tuesday, March 1st, 2011

NoSQL Databases: Why, what and when by Lorenzo Alberton.

When I posted RDBMS in the Social Networks Age I did not anticipate returning the very next day with another slide deck from Lorenzo. But, after viewing this slide deck, I just had to post it.

It is a very good overview of NoSQL databases and their underlying principles, with useful graphics as well (as opposed to the other kind).

I am going to have to study his graphic technique in hopes of applying it to the semantic issues that are at the core of topic maps.

Cassandra’s data model as records and lists – Post

Thursday, February 24th, 2011

Cassandra’s data model as records and lists

From the post:

I have to admit I’ve never really been happy with Cassandra’s data model, or to be more precisely, I’ve never really been with my understanding of the model. However I’ve realized that if we think of two use cases for column families then things may become a bit clearer. For me, Column families can be used in one of two ways, either as a record or an ordered list.

I thought it was helpful, what do you think?