Sharding and Replication

9 min read · Updated 2026-04-25

Sharding and replication are the two foundational approaches to building distributed data systems. They solve different problems — but most production systems combine them.

Single Instance — The Starting Point

Everything happens on one node. Strong consistency, simple reasoning, but no horizontal scalability. Examples: single-instance Postgres, Redis without clustering, monolithic app servers.

The only option for handling more load is vertical scaling — bigger CPU, more memory, more storage. Has physical and economic limits.

Replication

Single-leader replication

The simplest replication setup: one leader (a.k.a. primary, master, write replica) accepts all writes. Followers (read replicas, hot standbys) receive a stream of changes from the leader and serve reads.

Writes go to leader

Single point of write coordination. Easy to reason about ordering — natural linearizability.

Reads can scale

Spread read load across followers. Great for read-heavy workloads.

Replication modes

Synchronous (slower writes, no data loss) vs. asynchronous (faster, possible staleness).

Failover

Manual or automatic (via Raft, Paxos). Automatic = faster recovery; manual = avoids split-brain.

Examples: Postgres streaming replication, MySQL master-slave, MongoDB replica sets, Redis Sentinel.

Synchronous replication

No data loss, slower writes

Leader waits for at least one follower to acknowledge before confirming the write. Strong consistency but increased latency and reduced availability if followers are slow or unreachable.

Asynchronous replication

Fast writes, possible staleness

Leader confirms writes immediately, replicates in background. Better performance, but readers can see stale data on followers — and recent writes can be lost if leader crashes before replication.

Multi-leader replication

Multiple nodes accept writes simultaneously. Each leader replicates to others; each leader can also have its own followers.

When you need it:

Geographic distribution. Each region has a local leader for low-latency writes.
Collaborative editing. Mobile devices that can be offline for days are essentially leaders that sync.
High write availability. No single write bottleneck.

The cost: conflict resolution. When the same data is written at two leaders concurrently, you must decide whose change wins.

Last-write-wins (LWW)

Pick the most recent timestamp. Simplest, but clock skew can cause data loss when "wrong" write wins.

Version vectors

Track causality between updates. More accurate; more complex to implement.

Application-specific

Let the app decide based on business semantics (merge two carts, sum two counters).

CRDTs

Conflict-free Replicated Data Types that automatically merge. Math-backed, but limited shape options.

Multi-leader systems generally cannot guarantee linearizability. Google Spanner uses atomic clocks (TrueTime) to approximate it, but that’s specialized hardware and serious operational complexity.

Leaderless replication

Every node accepts both reads and writes. No special coordinator. Used by Cassandra, DynamoDB, Riak.

Quorum consistency. With N replicas, if you read from R replicas and write to W replicas where R + W > N, you’re guaranteed to read the latest value.

N = 3 replicas
W = 2 (write to 2)
R = 2 (read from 2)
W + R = 4 > N = 3  ✓ guaranteed consistent reads

Read repair

When clients read from multiple replicas, they detect and fix stale values inline.

Anti-entropy

Background process that compares replicas and propagates differences.

Sloppy quorum

When some target nodes are unavailable, write to other nodes temporarily.

Hinted handoff

When original nodes recover, they receive writes that were temporarily held elsewhere.

Sharding

Sharding (also called horizontal partitioning) splits data across nodes. Each node holds only a slice. Writes scale horizontally because different shards can accept writes in parallel.

Sharding strategies

Range-based

Order-preserving

Split by key range (e.g., A-F on shard 1, G-M on shard 2). Range queries are efficient. Risk: hot spots if traffic skews to one range.

Hash-based

Even distribution

Hash the key, mod by shard count. Even traffic distribution. Range queries become expensive (must scatter-gather).

Directory-based

Lookup table

A directory service maps keys to shards. Maximum flexibility, easy to rebalance. Cost: directory becomes a bottleneck and SPOF.

Consistent hashing

Minimize remapping

Adding/removing a node only relocates a fraction of keys. Used by Cassandra, DynamoDB, modern caches.

The hot-spot problem

A single popular key (a celebrity user, a viral product) can overwhelm one shard. Mitigations:

Salting keys — append random suffix to spread writes across shards.
Read replicas of hot shards — let reads scale even if writes are concentrated.
Caching layer — most hot data should never reach the database.

Resharding

Adding or removing shards is operationally expensive. Strategies:

Pre-split shards — start with more shards than you need, redistribute as you grow.
Logical shards over physical — many logical shards on fewer physical nodes; rebalance by moving logical shards.
Online migration tools — Vitess for MySQL, Citus for Postgres handle this with minimal downtime.

Combining Sharding and Replication

Most production systems use both:

Each shard is replicated for fault tolerance and read scaling.
Different shards live on different nodes for write scaling.

Shard A (users a-h)            Shard B (users i-p)            Shard C (users q-z)
   ┌───────────────┐                ┌───────────────┐                ┌───────────────┐
   │  Leader  A    │ ──replication─►│   Leader B    │                │   Leader C    │
   │  Follower A1  │                │   Follower B1 │                │   Follower C1 │
   │  Follower A2  │                │   Follower B2 │                │   Follower C2 │
   └───────────────┘                └───────────────┘                └───────────────┘

Each shard is independent. A query for user “k” routes to Shard B; cross-shard queries (e.g., “give me all users”) fan out to all shards.

Multi-Tenant SaaS Patterns

For multi-tenant SaaS, shard-by-tenant is often the right model:

One shard per tenant

Strong isolation, easy GDPR compliance, simple per-tenant operations. Operational overhead grows with tenant count.

Many tenants per shard

Most efficient. Hash tenants to shards. Need to handle hot tenants (one big tenant overwhelming a shard).

Hybrid

Big enterprise tenants get dedicated shards; smaller tenants share. Common pattern at scale.

Region-per-tenant

For data residency compliance. Each tenant's data stays in their geographic region.

Recap

Replication = same data on multiple nodes (availability + read scaling).
Sharding = different data on different nodes (write scaling).
Production systems combine them — each shard replicated for fault tolerance.
Three replication models: single-leader (simple, scales reads), multi-leader (writes scale, conflicts), leaderless (symmetric, quorum-based).
Sharding strategies: range-based, hash-based, directory-based, consistent hashing. Each has trade-offs.
For multi-tenant SaaS, shard-by-tenant (with hybrids for big customers) is usually the right model.