Skip to content
3 min read

CAP Theorem

Real Incident: GitHub, October 2018

A 43-second network partition split GitHub's MySQL cluster. Both sides promoted a primary. Two databases accepted writes simultaneously. Result: 24 hours of degraded service, millions of developers affected. This is CAP theorem hitting production.


The 30-Second Explanation

CAP Theorem

In a distributed system, when a network partition happens, you must choose:

🔒

Consistency (CP)

"I'd rather show an error than wrong data"

Banks, inventory, bookings

🌐

Availability (AP)

"I'd rather show stale data than nothing"

Social feeds, DNS, shopping carts

The key insight: You're not choosing between C, A, and P. Partitions are inevitable. You're choosing between C and A when a partition happens.


The Pizza Shop Analogy

You own 3 pizza shops in SF, NYC, and Chicago. All must serve the same menu.

The phone lines between shops go down (partition). A customer in Chicago asks for a pizza you just added in SF 5 minutes ago.

You choose... What happens Real-world equivalent
Consistency "Sorry, system is down. Come back later." Banking app during outage
Availability "Here's our menu!" (missing the new pizza) Netflix showing stale recommendations

That's it. That's the entire theorem.


What FAANG Interviewers Actually Ask

"Given this system, would you choose CP or AP?"

Framework to answer:

If the data is... Choose Because Example
Financial / transactional CP Wrong balance = lawsuit Stripe, banks
User-generated content AP Stale likes > error page Instagram, Twitter
Inventory / booking CP Overselling = real money lost Uber seats, airline tickets
Session / preference AP Show old settings > force re-login Netflix, Spotify
Leader election / config CP Split-brain = catastrophe ZooKeeper, etcd

Real Systems Mapped

System Choice What happens during partition
ZooKeeper🔒 CPMinority side stops accepting writes. Waits for quorum.
etcd🔒 CPRaft consensus — no leader = no writes.
Google Spanner🔒 CPTrueTime + Paxos. Blocks rather than diverge.
Cassandra🌐 APEvery node accepts writes. Resolves conflicts later (last-write-wins).
DynamoDB🌐 APEventually consistent by default. Strong consistency optional (costs 2x).
MongoDB🔒 CPPrimary goes down → election (10-30s downtime). Reads can be stale on secondaries.

PACELC — The Follow-Up They Always Ask

"OK, so you know CAP. What about when there's NO partition?"

PACELC extends CAP: if P**artition → choose **A or C. **E**lse (normal operation) → choose **L**atency or **C**onsistency.

System During Partition Normal Operation
Cassandra A (stay available) L (fast, eventually consistent)
DynamoDB A L (default) or C (strong reads cost 2x)
ZooKeeper C (refuse writes) C (always consistent, higher latency)
Spanner C C (TrueTime makes it fast despite consistency)
MongoDB C L (reads from secondaries are faster but stale)

Interview Gold

When an interviewer asks "What's the trade-off of using DynamoDB?" — answer with PACELC: "It's PA/EL by default. Available during partitions, low-latency in normal operation, but eventually consistent. You can opt into strong consistency per-read at 2x cost."


Consistency Models (From Strict to Relaxed)

Model Promise Real-World Feel
Linearizability Every read sees the absolute latest write Your bank balance — always correct
Sequential All nodes agree on ordering, maybe slightly behind A shared Google Doc
Causal If A caused B, everyone sees A before B Chat messages in order
Eventual Everyone will eventually agree Instagram like count (might be off by a few)

The 3 Interview Mistakes That Get You Rejected

Don't Say These

  1. "Just pick CP for everything" — Shows you don't understand the trade-off. No interviewer wants a system that goes down during every network blip.
  2. "CA is a valid option" — In any real distributed system, partitions WILL happen. CA only exists on a single machine (not distributed).
  3. "Eventual consistency means data loss" — No. It means temporary staleness. All writes are eventually propagated. No data is lost.

Your Interview Answer Template

When asked "How would you handle consistency in [System X]?"

"For [specific use case], I'd choose [CP/AP] because [reason tied to business impact]. During normal operation, I'd optimize for [latency/consistency]. Specifically, I'd use [real system] which gives me [specific guarantee]. For the parts that need [opposite choice], I'd use a separate store — for example [second system] for [that specific data]."

Example: "For the payment service, I'd choose CP using PostgreSQL with synchronous replication — a wrong balance is worse than brief unavailability. But for the activity feed, I'd use Cassandra (AP) because showing a slightly stale feed is fine, and we need sub-10ms reads globally."


Quick Recall Card

Question Answer
What is CAP? During a partition, choose Consistency or Availability
Why not both? Partitions are inevitable in distributed systems
CP examples? ZooKeeper, etcd, Spanner, HBase, MongoDB
AP examples? Cassandra, DynamoDB, CouchDB, DNS
What's PACELC? Extends CAP: what do you trade-off when there's NO partition?
Most common mistake? Treating it as "pick 2 of 3" instead of "CP or AP during partition"

See Also