Section 6

Resilience & Observability

Redundancy, rate limiting, circuit breakers, bulkheads, the outbox pattern, and SRE practices.

7 lessons ยท ~1 h reading

Resilience Patterns

6.1

Redundancy

7 min โ†’

The foundational fault-tolerance pattern. Active-active, active-passive, N+1 redundancy, and the levels at which redundancy matters in modern SaaS.

6.2

Rate Limiting

8 min โ†’

How to protect SaaS platforms from abuse and noisy neighbors. Algorithms, where to enforce, and the multi-dimensional approach for tenant-aware limits.

6.3

Circuit Breakers

7 min โ†’

The pattern that stops cascading failures. How circuit breakers work, where to put them, and how to combine with retries and timeouts for production-grade resilience.

6.4

Bulkheads

7 min โ†’

The pattern that limits blast radius. Resource isolation via thread pools, connection pools, and tenant partitioning โ€” keeping one failure from spreading.

6.5

Outbox Pattern

7 min โ†’

The atomic-write-and-publish problem and its canonical solution. How to reliably emit events when state changes โ€” without losing events or causing inconsistencies.

6.6

Caching

9 min โ†’

How caches actually work in production. Where to put them, eviction policies, invalidation strategies, and the patterns that turn caching from a foot-gun into a force multiplier.

Operations & Monitoring

6.7

Observability and SRE

11 min โ†’

Metrics, logs, traces, and the SRE practices that turn raw signals into reliable systems. SLIs, SLOs, error budgets, and on-call you can actually live with.

Start with 6.1 โ†’