Distributed Consensus

9 min read · Updated 2026-04-25

Distributed consensus solves the fundamental problem of getting multiple computers to agree on a single value, even when some nodes can fail or become unreachable. It’s one of the most foundational problems in computer science — and underlies database consistency, leader election, atomic transactions, and configuration changes in every distributed system.

What Consensus Is For

Despite many concrete uses, consensus problems boil down to two patterns:

Single-value consensus
Agree on one decision
Leader election, configuration changes, cluster membership. "Should D join the cluster?" "Who is the new leader?" Algorithms: Basic Paxos.
Sequence consensus
Agree on an ordered log
A replicated log of operations. All nodes process operations in the same order. Algorithms: Multi-Paxos, Raft, ZAB. Powers most distributed systems.

Concrete examples

Single-value consensus:

Sequence consensus (replicated log):

The Major Algorithms

Paxos (Lamport, 1998)

The original. Famously hard to understand from the paper, easy to implement incorrectly. Two flavors:

Used in: Google Chubby, Spanner, MongoDB internal coordination.

Raft (Ongaro & Ousterhout, 2014)

Designed explicitly to be understandable. Strong leader model — one leader, replicated log, predictable failure semantics.

Leader election
Nodes start as followers. If they don't hear from a leader, they become candidate and request votes. Majority wins.
Log replication
Leader receives commands, appends to local log, replicates to followers. Once a majority commits, the entry is "committed."
Safety
A committed entry is never lost or reordered. Even after leader changes.
Membership changes
Joint consensus protocol for safely changing the cluster while preserving safety.

Used in: etcd, Consul, CockroachDB, TiKV, Vault. By far the most popular consensus algorithm in modern infrastructure.

ZAB (ZooKeeper Atomic Broadcast)

Specifically designed as an atomic broadcast protocol — guarantees totally ordered message delivery. Used by ZooKeeper.

Functionally similar to Multi-Paxos but optimized for the “primary-backup” model where a leader broadcasts updates to followers.

PBFT (Practical Byzantine Fault Tolerance)

Extends consensus to handle malicious nodes that may lie. Critical for blockchain and multi-organization systems where you can’t trust all participants.

Higher overhead than Paxos/Raft (3f+1 nodes to tolerate f malicious failures, vs. 2f+1 for crash failures), but necessary in adversarial environments.

How They Work — The Common Pattern

Despite differences in detail, consensus algorithms share a structure:

1. Propose:    Node proposes a value or operation.
2. Promise:    Other nodes agree to consider it (and ignore older proposals).
3. Accept:     Once majority agrees, the value is "accepted."
4. Commit:     All nodes apply the value once they know it's accepted.

The key invariant: a value can only be committed if a majority of nodes agree. This majority requirement is what enables both safety (no two majorities can disagree — that’s mathematically impossible) and progress (you don’t wait forever for failed nodes).

Performance

Consensus is expensive. Each operation requires:

Typical numbers: Raft cluster of 3-5 nodes processes thousands of ops/sec at low milliseconds latency. Spread across regions, latencies dominate.

Same data center
Sub-millisecond network
Consensus latency dominated by fsync, not network. ~1-5ms per operation. Scale: tens of thousands of ops/sec.
Cross-region
Tens of milliseconds network
Consensus latency dominated by network round-trip. 50-200ms per operation. Scale: hundreds of ops/sec at best.

This is why most production systems use consensus only for metadata operations (leader election, configuration, membership) — not for every user request. User-facing reads/writes go through the elected leader, which doesn’t need consensus per request.

Where You’ll See This in Production

Distributed databases
CockroachDB, Spanner, TiKV use Raft for replicated logs. Aurora and Spanner use Paxos variants.
Coordination services
etcd (Raft), ZooKeeper (ZAB), Consul (Raft) — used as the "brain" of K8s, service discovery, distributed locks.
Blockchain
Bitcoin uses proof-of-work; modern chains (Ethereum 2, Solana) use various BFT consensus protocols (Tendermint, HotStuff, etc.).
Cluster orchestration
Kubernetes control plane uses etcd. The Raft cluster behind etcd is what makes K8s itself reliable.

Practical Implications for SaaS

You’ll rarely implement consensus yourself. But understanding it shapes operational decisions:

Recap