Rate Limiting
Rate limiting controls the rate of requests a client (or tenant, or IP, or anyone) can make. Itβs one of the most universally needed patterns in SaaS β for protecting against abuse, noisy neighbors, billing tier enforcement, and accidental traffic spikes.
This lesson covers the algorithms, where to enforce, and the multi-dimensional approach for production multi-tenant SaaS.
Why Rate Limit
The Algorithms
Token Bucket
The most common, most flexible algorithm.
Bucket size: B (max tokens)
Refill rate: R (tokens per second)
On request:
if bucket has tokens:
take 1 token, allow request
else:
reject request
Leaky Bucket
Bucket size: B (queue capacity)
Drain rate: R (requests per second)
On request:
if bucket has space:
add request to bucket
else:
reject request
Worker drains R requests per second from the bucket.
Smoother output rate β drains at constant R regardless of input pattern. Good for shaping traffic to a downstream system that needs predictable load.
Fixed Window Counter
Count requests per fixed time window (e.g., per minute).
At window boundary: reset counter.
On request:
if counter < limit:
increment counter, allow
else:
reject
Simple. Has a boundary problem: a client can fire 2Γ the limit in a 1-second period spanning two windows. Usually replaced by sliding-window approaches.
Sliding Window Log
Track timestamps of recent requests. Count those in the last N seconds.
On request at time T:
remove entries older than (T - window) from log
if log size < limit:
append T, allow
else:
reject
Most accurate; expensive memory-wise (one entry per request).
Sliding Window Counter
Hybrid: keeps fixed-window counts but interpolates across the boundary.
current_count = (count_in_previous_window Γ overlap_fraction)
+ count_in_current_window
if current_count < limit: allow
The pragmatic choice. Approximate but uses constant memory; avoids boundary issues.
Comparison
| Algorithm | Burst | Smooth output | Memory | Common use |
|---|---|---|---|---|
| Token bucket | β | β | O(1) | API rate limits, AWS APIs |
| Leaky bucket | β | β | O(B) | Traffic shaping to downstream |
| Fixed window | β | β | O(1) | Simple, accept the boundary issue |
| Sliding log | β | β | O(N) | Precise limits where memory is cheap |
| Sliding counter | β | β | O(1) | Production default for most APIs |
Where to Enforce
Distributed Rate Limiting
For multi-instance services, rate limiters must share state.
For most SaaS, Redis-based central rate limiting is the right answer. The 1-2ms cost is acceptable; the simplicity is real.
A canonical Redis pattern (sliding window counter):
# Lua script for atomicity
script = """
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
local count = redis.call('ZCARD', key)
if count < limit then
redis.call('ZADD', key, now, now)
redis.call('EXPIRE', key, window)
return 1
end
return 0
"""
Multi-Dimensional Rate Limiting in SaaS
For a real multi-tenant SaaS, single-dimension limits arenβt enough. You need multi-dimensional rate limiting:
Multiple dimensions enforce simultaneously. A request must pass all of them.
Response Patterns
When rate-limited, return:
Client Behavior
Even with great rate limits, clients can DDoS you with thundering retries. Best-practice client behavior:
- Exponential backoff β wait 2^n seconds between retries.
- Jitter β randomize backoff timing to avoid synchronized retries.
- Honor Retry-After β if the server tells you when to retry, listen.
- Circuit breaker (next lesson) β stop hammering when the server is failing.
Recap
- Rate limiting protects against abuse, noisy neighbors, and backend overload.
- Algorithms: token bucket (burst-friendly), leaky bucket (smoothing), sliding-window counter (production default).
- Enforce at multiple layers: edge, gateway, mesh, application.
- Distributed rate limiting via Redis is the standard pattern.
- For multi-tenant SaaS: multi-dimensional limits (tenant + user + endpoint + tier + API key + IP).
- Use HTTP 429, Retry-After, rate limit headers, helpful error bodies.
- Combined with client-side backoff + jitter, rate limiting is one of the most effective patterns in SaaS.