Caching
“There are only two hard things in computer science: cache invalidation and naming things.” — Phil Karlton
Caching is one of the most universally applied performance techniques. It’s also one of the easiest to get wrong. This lesson covers where caches live in real systems, the policies that govern them, and the patterns that turn caching from a source of bugs into a source of speed.
Why Caches Work
The fundamental observation: memory access is faster than disk; disk is faster than network; local is faster than remote. Each cache hit replaces a slower operation with a faster one.
CPU register access: ~1 ns
L1 cache: ~1 ns
L2 cache: ~5 ns
L3 cache: ~10 ns
Main memory (RAM): ~100 ns
SSD: ~50,000 ns (50 µs)
Network round-trip: ~500,000 ns (500 µs) intra-DC
~100,000,000 ns (100 ms) cross-region
A cache hit replacing a network call is 5 orders of magnitude faster.
Cache Locations
Caches live at every layer of a modern application.
The cache hierarchy is layered — a request might hit CDN, miss, hit reverse proxy, miss, hit Redis, miss, hit DB. Each layer multiplies the protective effect.
Caching Patterns
Cache-aside (lazy loading)
The most common pattern. Application checks cache first; on miss, loads from DB and populates cache.
def get_user(user_id):
cached = cache.get(f"user:{user_id}")
if cached:
return cached
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
cache.set(f"user:{user_id}", user, ttl=300)
return user
Write-through
Application writes to cache and DB simultaneously.
def update_user(user_id, data):
db.update(user_id, data)
cache.set(f"user:{user_id}", data, ttl=300)
Cache always consistent with DB. Cost: every write hits the cache, even for data nobody reads.
Write-behind (write-back)
Writes go to cache; cache asynchronously persists to DB.
Read-through
Cache is the primary interface; cache library loads from DB on miss. Application doesn’t talk to DB directly. Less common; more often a feature of specific cache products.
TTL and Eviction
When the cache is full, what gets removed?
Most production caches combine TTL (catches stale data) with LRU (catches memory pressure).
The Big Hard Problem: Cache Invalidation
How do you know when a cache entry is stale? Three approaches:
A third approach — event-driven invalidation — uses a stream of change events (CDC, outbox) to invalidate cache entries automatically.
[ DB write ] → [ CDC / outbox ] → [ Kafka topic ] → [ Cache invalidator ]
This decouples write paths from cache knowledge. Writers don’t need to know what to invalidate; the cache subscribes to relevant change events.
Cache Stampede Problem
When a popular cache key expires, all concurrent requests for it miss simultaneously. They all hit the database. Database gets crushed.
Negative Caching
Cache “not found” results, not just “found” results. Otherwise, every request for a non-existent key (e.g., from probing) hits the DB.
def get_user(user_id):
cached = cache.get(f"user:{user_id}")
if cached is None:
# Cache miss — load from DB
user = db.query(...)
if user:
cache.set(f"user:{user_id}", user, ttl=300)
else:
cache.set(f"user:{user_id}", "NOT_FOUND", ttl=60) # short TTL for negatives
return user
if cached == "NOT_FOUND":
return None
return cached
Especially important for high-traffic public APIs vulnerable to enumeration attacks.
What to Cache (and What Not To)
Multi-Tenant Caching
For multi-tenant SaaS, tenant context must be in the cache key. Otherwise, you’ll serve one tenant’s data to another.
# Wrong — cross-tenant leak
cache.get(f"user:{user_id}")
# Right — tenant-scoped key
cache.get(f"tenant:{tenant_id}:user:{user_id}")
For very-popular cross-tenant data (configurations, lookup tables), shared cache is fine. For tenant-specific data, always namespace by tenant ID.
Recap
- Caches save 5-6 orders of magnitude per hit. The performance lever is real.
- Cache locations: browser, CDN, reverse proxy, distributed cache (Redis), in-process, DB.
- Patterns: cache-aside (most common), write-through, write-behind, read-through.
- Eviction: TTL + LRU is the standard.
- Cache invalidation is the hard problem. TTL, explicit invalidation, or event-driven via CDC.
- Watch for cache stampedes — use locks, stale-while-revalidate, or request coalescing.
- Cache negatives, not just positives — especially for public APIs.
- Don’t cache everything. Cache where the trade-off (staleness for speed) is worth it.
- For multi-tenancy, always namespace cache keys by tenant ID.