Skip to main content

CAP Theorem

TL;DR In a distributed system, during a network partition you must choose: remain consistent (reject writes/reads that could be stale) or remain available (serve requests that may return stale data). You cannot have both. Partition tolerance is not optional in real networks.

The Three Properties

Consistency (C): Every read receives the most recent write or an error. All nodes see the same data at the same time.

Availability (A): Every request receives a response (not an error), though it may not contain the most recent write.

Partition Tolerance (P): The system continues operating even when some messages between nodes are dropped or delayed.

Why P Is Not Optional

Networks fail. Routers drop packets. Data centers lose connectivity. A system that stops working during any partition is too fragile for production. So the real choice is CP vs AP during partition events.

CP Systems

Prioritize consistency. During a partition, they become unavailable rather than risk returning stale data.

Examples: HBase, Zookeeper, etcd, traditional RDBMS with synchronous replication.

User request → primary unreachable → return error (rather than stale read)

Use when: financial transactions, leader election, configuration management — anywhere wrong data is worse than no data.

AP Systems

Prioritize availability. During a partition, they continue serving requests but may return stale data. After partitions heal, nodes reconcile via eventual consistency.

Examples: Cassandra, DynamoDB, CouchDB, DNS.

User request → read from local replica → return possibly-stale data
Partition heals → nodes sync, conflicts resolved

Use when: shopping carts, social feeds, analytics, session storage — where stale data is acceptable and downtime is not.

PACELC: A More Nuanced Model

CAP only applies during partitions. PACELC extends it:

If Partition (P): choose Availability (A) or Consistency (C). Else (E): choose Latency (L) or Consistency (C).

Even without partitions, distributing data means trading latency for consistency. DynamoDB is AP/EL: favors availability during partitions and low latency otherwise.

Consistency Levels in Practice

Real systems offer tunable consistency:

LevelGuaranteeLatency
LinearizableStrongest: appears instantaneousHighest
SequentialAll nodes see same order, not real-timeHigh
CausalCausally related ops are orderedMedium
EventualAll nodes converge, eventuallyLowest

Cassandra lets you choose per-query: QUORUM, ONE, ALL.

Gotchas

CAP is about trade-offs during failure, not normal operation. A well-designed AP system with quorum reads can appear consistent most of the time.

"Consistent" in CAP ≠ "Consistent" in ACID. CAP consistency is about distributed read-write ordering. ACID consistency is about database invariants.

Network partitions include slow nodes. A timeout is a partition. A node that takes 30 seconds to respond has effectively partitioned itself.

System Design OverviewOther distributed systems concepts