Sunday, January 27, 2013

The CAP Theorem

Life is full of tradeoffs.  I once had an ancient barber (and man of few words) who kept a sign on the wall.  It said: "Fast, Good, or Cheap: Pick Two."  After many fast, cheap, and bad haircuts, I've found that this aphorism applies not only to coiffuring.

You can find tradeoffs like this anywhere you look.  When it comes to data storage, the tradeoff is between consistency, availability, and partition tolerance.  According to a famous conjecture by Eric Brewer--and later formally proven by Seth Gilbert and Nancy Lynch--you can only pick two of the three.

In any production environment, you have distributed data sets stored on a number of different nodes.  The edge case is one node, but it's most likely (I hope) that you have data distributed across two or more.  MS SQL Server, for example, lets you easily set up replication across two nodes in an active-passive model (Microsoft calls this mirroring).  If one node goes down, the other takes over.  When the first comes back on, it syncs with the second.  Amazon's Dynamo replicates over N nodes, providing a highly configurable and fault-tolerant environment.  If a node goes down, its data will be replicated to others.  But you don't have to go that far.  A tape backup could be considered a high-latency node and might be sufficient, depending on your needs.

Before talking about the tradeoffs, let's consider a few definitions. 

Consistency refers to the degree to which nodes in a distributed system agree about a data set.  In a 100% consistent system, all nodes see the same data at the same instant.  Consistency could be measured as a fraction of the number of nodes.  Given 5 nodes where 4 agree, you'd have 80% consistency.

Availability refers to the responsiveness of a system.  In a 100% available system, all requests will receive responses.  Availability is usually called uptime and is measured in units of time.  In order to increase availability, you typically increase the number of nodes in your network.

Finally, partition tolerance refers to the ability of the system to handle disruptions in network communication between nodes.  A 100% partition tolerant system would be able to work without any network communication.  That is, it wouldn't be a distributed system.

From these definitions, you should be able to guess at why you can't have all three.  Rather than offering a formal proof, I think it's easiest to see why by considering each combination in turn.


A consistent and available system (CA) will always give you the same answer, and it will always respond to requests.  However, it won't be distributed.  The token CA system is a single database, whether it be PostgreSQL or Redis.  As long as you use some kind of transaction isolation level, you don't have to worry about inconsistencies in your data, and you don't have to build any application logic to shard data to different databases.  As your data grows, however, this system may not suffice.

A consistent and partition-tolerant system (CP) is a distributed system that enforces requirements regarding the number of nodes that must be updated in order for a write to be considered successful.  HBase and MongoDB do this, for example.  If nodes go down, they will replicate data to other nodes.  However, because they have consistency guarantees, it is possible that they may refuse reads or writes in the event of a network failure.  For example, if you have a three-node system that requires two nodes to agree, and you have a network error in which one node is separated from the other two, that node will not be available.  Any clients connecting to it will be denied service.

Finally, an available and partition-tolerant system (AP) replicates data all over the place so it can't possibly guarantee consistency.  For example, CouchDB and Riak embrace fault-tolerance over consistency.  The idea is that they will be eventually consistent, but they might not be consistent immediately following a write or network outage.

Now, almost any system can be CAP if you're willing to wait.  The tradeoff only exists when we're talking about a single instant.  If a system takes only a few ms to become consistent, this may be sufficient.  The Internet's DNS will eventually be consistent, but there are no guarantees about how long this will take.  Riak lets you tweak the number of nodes that must agree before a read or write is considered successful.  Still, in a sufficiently large system, you may not be able to wait that long.

Diving into the NoSQL universe may be daunting, but a big part is determining where you fall in the CAP spectrum.  Even if you're not considering a NoSQL database, I think it's useful to understand these tradeoffs.  Are there any other tradeoffs facing distributed data solutions?

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...