CAP theorem simply explained

CAP theorem is something that everyone who works with distributed systems should know about. This blog post describes in simple terms what it means.

In the old days of computing applications ran on big machines like mainframes. Hundreds or thousands of clients connected to that single machine in order to use the application. This is somewhat similar to what cloud computing is today. But there is one major difference. Nowadays applications are spread across multiple servers, datacenters or even continents. While that allows us to scale horizontally it brought up some new issues.

Distributed systems are expected to guarantee three things: Consistency, Availability and Partition Tolerance (CAP). CAP theorem is also known as Brewers theorm (after Dr. Eric A. Brewer) who gave a keynote talk about that topic at the PODC conference in 2000 (https://www.podc.org/podc2000/).

CAP theorem basically says that in case of a partition we can either have availability or consistency but not both at the same time.

Let’s visualize that with a system that has 2 Nodes and 2 Clients. Client 1 connects to Node 1 and Client 2 connects to Node 2. Everything that is written on Node1 gets replicated to Node2 and vice versa. This is how we designed our system and we want it to be in that state 100% of the time.

%3 Client1 Client1 Node1 Node1 Client1->Node1 write Client2 Client2 Node2 Node2 Client2->Node2 write Node1->Node2 replication Node2->Node1

But eventually Murphy will kick in. We can not avoid that.
Let’s see what happens if for whatever reason Node1 and Node2 are disconnected from each other. If both clients continue to write data we can not replicate it anymore. We lose consistency.

%3 Client1 Client1 Node1 Node1 (outdated data) Client1->Node1 write Client2 Client2 Node2 Node2 (outdated data) Client2->Node2 write

To keep our data consistent both nodes must block write access. That means we lose availability.

%3 Client1 Client1 Node1 Node1 (up to date) Client2 Client2 Node2 Node2 (up to date)

Everyone designing a distributed system has select either C and A, C and P or A and C. There is no chance to get all 3 at once. This is why there is no blueprint for a Distributed System. Some focus on availability like CDNs and DNS while others prefer consistency over availability like distributed datastores.

Cloud Providers spend an incredible amount of money on preventing partitions. We will dig deeper into how avoid partitions in some upcoming blog posts.

Contact