CAP Theorem
TL;DR In a distributed system, during a network partition you must choose: remain consistent (reject writes/reads that could be stale) or remain available (serve requests that may return stale data). You cannot have both. Partition tolerance is not optional in real networks.
The Three Properties
Consistency (C): Every read receives the most recent write or an error. All nodes see the same data at the same time.
Availability (A): Every request receives a response (not an error), though it may not contain the most recent write.
Partition Tolerance (P): The system continues operating even when some messages between nodes are dropped or delayed.
Why P Is Not Optional
Networks fail. Routers drop packets. Data centers lose connectivity. A system that stops working during any partition is too fragile for production. So the real choice is CP vs AP during partition events.
CP Systems
Prioritize consistency. During a partition, they become unavailable rather than risk returning stale data.
Examples: HBase, Zookeeper, etcd, traditional RDBMS with synchronous replication.
User request → primary unreachable → return error (rather than stale read)
Use when: financial transactions, leader election, configuration management — anywhere wrong data is worse than no data.
AP Systems
Prioritize availability. During a partition, they continue serving requests but may return stale data. After partitions heal, nodes reconcile via eventual consistency.
Examples: Cassandra, DynamoDB, CouchDB, DNS.
User request → read from local replica → return possibly-stale data
Partition heals → nodes sync, conflicts resolved
Use when: shopping carts, social feeds, analytics, session storage — where stale data is acceptable and downtime is not.
PACELC: A More Nuanced Model
CAP only applies during partitions. PACELC extends it:
If Partition (P): choose Availability (A) or Consistency (C). Else (E): choose Latency (L) or Consistency (C).
Even without partitions, distributing data means trading latency for consistency. DynamoDB is AP/EL: favors availability during partitions and low latency otherwise.
Consistency Levels in Practice
Real systems offer tunable consistency:
| Level | Guarantee | Latency |
|---|---|---|
| Linearizable | Strongest: appears instantaneous | Highest |
| Sequential | All nodes see same order, not real-time | High |
| Causal | Causally related ops are ordered | Medium |
| Eventual | All nodes converge, eventually | Lowest |
Cassandra lets you choose per-query: QUORUM, ONE, ALL.
Gotchas
CAP is about trade-offs during failure, not normal operation. A well-designed AP system with quorum reads can appear consistent most of the time.
"Consistent" in CAP ≠ "Consistent" in ACID. CAP consistency is about distributed read-write ordering. ACID consistency is about database invariants.
Network partitions include slow nodes. A timeout is a partition. A node that takes 30 seconds to respond has effectively partitioned itself.