Partitioning and Replication Strategies
Database performance becomes critical when application user counts go from hundreds to millions. Successful operation under load depends on understanding how different database families scale and choosing the right approach for your workload.
This lesson is a tour of the main database families and how each one scales.
The Trade-offs to Balance
Relational Databases (Postgres, MySQL)
Modern Postgres and MySQL have evolved significantly โ supporting JSON documents, full-text search, and geospatial queries that used to require NoSQL. They remain the foundational data layer for most SaaS platforms.
Single-master architecture
Postgres uses a single-master architecture: all writes through one primary server, reads scale via streaming replication to multiple read replicas.
Scaling writes in relational databases
When write throughput maxes out a single primary:
Document Stores (MongoDB)
MongoDB was built for horizontal scaling from day one. Sharded clusters distribute writes across multiple primaries simultaneously.
Architecture
[ Client ]
โ
โผ
[ mongos router ] โโ shard key lookup
โ
โโโบ [ Shard A โ Replica Set ]
โโโบ [ Shard B โ Replica Set ]
โโโบ [ Shard C โ Replica Set ]
- Each shard = a replica set with its own primary and secondaries.
mongosis the routing layer โ stateless, horizontally scalable.- Queries with shard key route to a single shard. Cross-shard queries fan out.
- Shard key choice is the critical decision โ bad keys create hot spots.
Wide-Column Stores (Cassandra)
Despite the name, โwide-columnโ doesnโt mean columnar storage. It means flexible row schemas where rows can have different columns. Internally, Cassandra is row-oriented.
Peer-to-peer architecture
[ Client ]
โ
โผ
[ Any node โ coordinator ] โโโ
โ โ
โโโโโโโโโโโโผโโโโโโโโโโโ โ consistent hashing
โผ โผ โผ โ โ owner
[Node 1] [Node 2] [Node 3] โ
No master. Every node is equal. Data distribution via consistent hashing. Any node can act as coordinator.
The cost: limited query flexibility (must design table layout per query pattern), no joins, eventual consistency by default.
Key-Value Stores (Redis, DynamoDB)
Simplest data model: keys map to values. The simplicity enables extreme scale and low latency.
Redis
In-memory key-value store. Sub-millisecond latency. Hugely versatile (strings, hashes, lists, sets, streams, pub/sub).
- Single-node โ up to 100k+ ops/sec at 1ms latency. Limited by RAM.
- Sentinel โ automatic failover for high availability.
- Cluster โ sharding across nodes via hash slots. Linear write scaling.
Used as cache, session store, real-time analytics, leaderboards, message queues.
DynamoDB
AWS-managed key-value with optional document features. Built on a Dynamo paper-style architecture.
The cost: limited query patterns (must design access patterns up front), join-less, item size limits.
NewSQL: SQL Without the Single-Master Limit
A category that emerged to combine SQL semantics with horizontal scale.
Choosing for Multi-Tenant SaaS
A reasonable pattern for a SaaS platform:
| Use case | Database choice |
|---|---|
| Tenant data, transactional | Postgres (with sharding or Aurora) |
| Tenant data, append-heavy time series | Cassandra or DynamoDB |
| User sessions, caching | Redis |
| Search | Elasticsearch / OpenSearch |
| Analytics aggregates | ClickHouse / BigQuery / Snowflake |
| Object storage | S3 |
Most SaaS platforms run multiple databases โ each chosen for what itโs best at. The pragmatic approach is โuse the right database for the job,โ not โforce everything into one.โ
Recap
- Different database families have fundamentally different scaling architectures.
- Relational (Postgres, MySQL) โ single-master, scale reads via replicas, scale writes via sharding tools (Citus, Vitess) or app-level partitioning.
- Document (MongoDB) โ sharded clusters, multi-primary, native horizontal scale; shard key choice is critical.
- Wide-column (Cassandra) โ peer-to-peer, optimized for writes and multi-DC, tunable consistency.
- Key-value (Redis, DynamoDB) โ simple data model, extreme scale, predictable latency.
- NewSQL (Spanner, CockroachDB, YugabyteDB) โ SQL with horizontal write scale, at operational cost.
- Real platforms run multiple databases. Pick the right one per access pattern.