Big Data Modeling with Cassandra

Big Data Modeling with Cassandra by Mat Brown.


When choosing the right data store for an application, developers face a trade-off between scalability and programmer-friendliness. With the release of version 3 of the Cassandra Query Language, Cassandra provides a uniquely attractive combination of both, exposing robust and intuitive data modeling capabilities while retaining the scalability and availability of a distributed, masterless data store.

This talk will focus on practical data modeling and access in Cassandra using CQL3. We’ll cover nested data structures; different types of primary keys; and the many shapes your tables can take. There will be a particular focus on understanding the way Cassandra stores and accesses data under the hood, to better reason about designing schemas for performant queries. We’ll also cover the most important (and often unexpected) differences between ACID databases and distributed data stores like Cassandra.

Mat Brown ( is a software engineer at Rap Genius, a platform for annotating and explaining the world’s text. Mat is the author of Cequel, a Ruby object/row mapper for Cassandra, as well as Elastictastic, an object/document mapper for ElasticSearch, and Sunspot, a Ruby model integration layer for Solr.

Mat covers limitations of Cassandra without being pressed. Not unknown but not common either.

Migration from relational schema to Cassandra is a bad idea. (paraphrase)

Mat examines the internal data structures that influence how you should model data in Cassandra.

At 17:40, shows how the data structure is represented internally.

The internal representation drives schema design.

You may also like Cequel by the presenter.

PS: I suspect that if considered carefully, the internal representation of data in most databases drives the advice given by tech support.

Comments are closed.