Sharding and Replication

9 min read Β· Updated 2026-04-25

Sharding and replication are the two foundational approaches to building distributed data systems. They solve different problems β€” but most production systems combine them.

Single Instance β€” The Starting Point

Everything happens on one node. Strong consistency, simple reasoning, but no horizontal scalability. Examples: single-instance Postgres, Redis without clustering, monolithic app servers.

The only option for handling more load is vertical scaling β€” bigger CPU, more memory, more storage. Has physical and economic limits.

Replication

Single-leader replication

The simplest replication setup: one leader (a.k.a. primary, master, write replica) accepts all writes. Followers (read replicas, hot standbys) receive a stream of changes from the leader and serve reads.

Writes go to leader
Single point of write coordination. Easy to reason about ordering β€” natural linearizability.
Reads can scale
Spread read load across followers. Great for read-heavy workloads.
Replication modes
Synchronous (slower writes, no data loss) vs. asynchronous (faster, possible staleness).
Failover
Manual or automatic (via Raft, Paxos). Automatic = faster recovery; manual = avoids split-brain.

Examples: Postgres streaming replication, MySQL master-slave, MongoDB replica sets, Redis Sentinel.

Synchronous replication
No data loss, slower writes
Leader waits for at least one follower to acknowledge before confirming the write. Strong consistency but increased latency and reduced availability if followers are slow or unreachable.
Asynchronous replication
Fast writes, possible staleness
Leader confirms writes immediately, replicates in background. Better performance, but readers can see stale data on followers β€” and recent writes can be lost if leader crashes before replication.

Multi-leader replication

Multiple nodes accept writes simultaneously. Each leader replicates to others; each leader can also have its own followers.

When you need it:

The cost: conflict resolution. When the same data is written at two leaders concurrently, you must decide whose change wins.

Last-write-wins (LWW)
Pick the most recent timestamp. Simplest, but clock skew can cause data loss when "wrong" write wins.
Version vectors
Track causality between updates. More accurate; more complex to implement.
Application-specific
Let the app decide based on business semantics (merge two carts, sum two counters).
CRDTs
Conflict-free Replicated Data Types that automatically merge. Math-backed, but limited shape options.

Multi-leader systems generally cannot guarantee linearizability. Google Spanner uses atomic clocks (TrueTime) to approximate it, but that’s specialized hardware and serious operational complexity.

Leaderless replication

Every node accepts both reads and writes. No special coordinator. Used by Cassandra, DynamoDB, Riak.

Quorum consistency. With N replicas, if you read from R replicas and write to W replicas where R + W > N, you’re guaranteed to read the latest value.

N = 3 replicas
W = 2 (write to 2)
R = 2 (read from 2)
W + R = 4 > N = 3  βœ“ guaranteed consistent reads
Read repair
When clients read from multiple replicas, they detect and fix stale values inline.
Anti-entropy
Background process that compares replicas and propagates differences.
Sloppy quorum
When some target nodes are unavailable, write to other nodes temporarily.
Hinted handoff
When original nodes recover, they receive writes that were temporarily held elsewhere.

Sharding

Sharding (also called horizontal partitioning) splits data across nodes. Each node holds only a slice. Writes scale horizontally because different shards can accept writes in parallel.

Sharding strategies

Range-based
Order-preserving
Split by key range (e.g., A-F on shard 1, G-M on shard 2). Range queries are efficient. Risk: hot spots if traffic skews to one range.
Hash-based
Even distribution
Hash the key, mod by shard count. Even traffic distribution. Range queries become expensive (must scatter-gather).
Directory-based
Lookup table
A directory service maps keys to shards. Maximum flexibility, easy to rebalance. Cost: directory becomes a bottleneck and SPOF.
Consistent hashing
Minimize remapping
Adding/removing a node only relocates a fraction of keys. Used by Cassandra, DynamoDB, modern caches.

The hot-spot problem

A single popular key (a celebrity user, a viral product) can overwhelm one shard. Mitigations:

Resharding

Adding or removing shards is operationally expensive. Strategies:

Combining Sharding and Replication

Most production systems use both:

Shard A (users a-h)            Shard B (users i-p)            Shard C (users q-z)
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚  Leader  A    β”‚ ──replication─►│   Leader B    β”‚                β”‚   Leader C    β”‚
   β”‚  Follower A1  β”‚                β”‚   Follower B1 β”‚                β”‚   Follower C1 β”‚
   β”‚  Follower A2  β”‚                β”‚   Follower B2 β”‚                β”‚   Follower C2 β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Each shard is independent. A query for user β€œk” routes to Shard B; cross-shard queries (e.g., β€œgive me all users”) fan out to all shards.

Multi-Tenant SaaS Patterns

For multi-tenant SaaS, shard-by-tenant is often the right model:

One shard per tenant
Strong isolation, easy GDPR compliance, simple per-tenant operations. Operational overhead grows with tenant count.
Many tenants per shard
Most efficient. Hash tenants to shards. Need to handle hot tenants (one big tenant overwhelming a shard).
Hybrid
Big enterprise tenants get dedicated shards; smaller tenants share. Common pattern at scale.
Region-per-tenant
For data residency compliance. Each tenant's data stays in their geographic region.

Recap