Data Model
  • Every row is identified by a unique key. The key is a string and there is no limit on its size.

  • An instance of Cassandra has one table which is made up of one or more column families as defined by the user.

  • The number of column families and the name of each of the above must be fixed at the time the cluster is started. There is no limitation the number of column families but it is expected that there would be a few of these.

  • Each column family can contain one of two structures: supercolumns or columns. Both of these are dynamically created and there is no limit on the number of these that can be stored in a column family.

  • Columns are constructs that have a name, a value and a user-defined timestamp associated with them. The number of columns that can be contained in a column family is very large. Columns could be of variable number per key. For instance key K1 could have 1024 columns/super columns while key K2 could have 64 columns/super columns.

  • “Supercolumns” are a construct that have a name, and an infinite number of columns assosciated with them. The number of “Supercolumns” associated with any column family could be infinite and of a variable number per key. They exhibit the same characteristics as columns.
Distribution, Replication and Fault Tolerance
  • Data is distributed across the nodes in the cluster using Consistent Hashing based and on an Order Preserving Hash function. We use an Order Preserving Hash so that we could perform range scans over the data for analysis at some later point.

  • Cluster membership is maintained via Gossip style membership algorithm. Failures of nodes within the cluster are monitored using an Accrual Style Failure Detector.

  • High availability is achieved using replication and we actively replicate data across data centers. Since eventual consistency is the mantra of the system reads execute on the closest replica and data is repaired in the background for increased read throughput.

  • System exhibits incremental scalability properties which can be achieved as easily as dropping nodes and having them automatically bootstrapped with data.

First deployment of Cassandra system within Facebook was for the Inbox search system. The system currently stores TB’s of indexes across a cluster of 600+ cores and 120+ TB of disk space. Performance of the system has been well within our SLA requirements and more applications are in the pipeline to use the Cassandra system as their storage engine. A beta version of Cassandra has been open sourced and can be found here. Systems of this nature are never really done. One starts by building certain core features that are absolutely required, achieves to get them right and proceeds to build the more fancy features. We have certain core features that needed to be built and have a lot more planned in the road ahead. If one is interested in solving hard distributed systems problems which affects the lives of millions of our users, feel free to contact Facebook