Could Cassandra be the first breakout NoSQL database? by Chris Mayer.
From the post:
Years of misunderstanding haven’t been kind to the NoSQL database. Aside from the confusing name (generally understood to mean ‘not only SQL’), there’s always been an air of reluctance from the enterprise world to move away from Oracle’s steady relational database, until there was a definite need to switch from tables to documents
The emergence of Big Data in the past few years has been the kickstart NoSQL distributors needed. Relational databases cannot cope with the sheer amount of data coming in and can’t provide the immediacy large-scale enterprises need to obtain information.
Open source offerings have been lurking in the background for a while, with the highly-tunable Apache Cassandra becoming a community favourite quickly. Emerging from the incubator in October 2011, Cassandra’s beauty lies in its flexible schema, its hybrid data model (lying somewhere between a key-value and tabular database) and also through its high availability. Being from the Apache Software Foundation, there’s also intrinsic links to the big data ‘kernel’ Apache Hadoop, and search server Apache Solr giving users an extra dimension to their data processing and storage.
Using NoSQL on cheap servers for processing and querying data is proving an enticing option for companies of all sizes, especially in combination with MapReduce technology to crunch it all.
One company that appears to be leading this data-driven charge is DataStax, who this week announced the completion of a $25 million C round of funding. Having already permeated the environments of some large companies (notably Netflix), the San Mateo startup are making big noises about their enterprise platform, melding the worlds of Cassandra and Hadoop together. Netflix is a client worth crowing about, with DataStax’s enterprise option being used as one of their primary data stores
Chris mentions some other potential players, MongoDB comes to mind, along with the Hadoop crowd.
I take the move from tables to documents as a symptom of deeper issue.
Relational databases rely on normalization to achieve their performance and reliability. So what happens if data is too large or coming too quickly to be normalized?
Relational databases remain the weapon of choice for normalized data but that doesn’t mean they work well with “dirty” data.
“Dirty data,” as opposed to “documents,” seems to catch the real shift for which NoSQL solutions are better adapted.
Your result are only as good as the data, but you know that up front. Not when you realize your “normalized” data, wasn’t.
That has to be a sinking feeling.