◐ system-design/caching.md
9. Caching: Strategies, Eviction, Redis vs Memcached
Caching is the highest-leverage optimization in system design. Memory is ~100,000× faster than disk seek; a cache that absorbs even 80% of reads is the difference between a melted database and a cool one.
~6 min read·updated 5/29/2026
9. Caching: Strategies, Eviction, Redis vs Memcached
Caching is the highest-leverage optimization in system design. Memory is ~100,000× faster than disk seek; a cache that absorbs even 80% of reads is the difference between a melted database and a cool one.
9.1 Where caches live
A request travels through many layers. Each can cache.
Client → CDN → Reverse proxy → App → DB
↑ ↑ ↑ ↑ ↑
L0 L1 L2 L3 L4
| Layer | Examples | What it caches |
|---|---|---|
| L0 Browser | HTTP cache, IndexedDB, service workers | static assets, API responses |
| L1 CDN / Edge | CloudFront, Cloudflare, Fastly | static assets, edge-cacheable HTML/JSON |
| L2 Reverse proxy | Nginx, Varnish | popular endpoints |
| L3 App | in-process (LRU map), distributed (Redis) | computed results, sessions |
| L4 DB | buffer pool, query cache | hot pages |
The further out (closer to user), the bigger the latency win. The closer to data, the easier the consistency.
9.2 Cache strategies (read patterns)
Cache-aside (lazy loading)
App is responsible.
read(key):
v = cache.get(key)
if v is None:
v = db.get(key)
cache.set(key, v, ttl=300)
return v
Pros: simple, only cache what's read, cache failure non-fatal. Cons: cache miss = extra round trip; cold-start storms ("thundering herd").
Read-through
Cache itself loads on miss. App always asks the cache. Implementation lives in the cache library.
Write-through
Every write goes to cache and DB synchronously. Cache stays consistent. Slow writes.
Write-behind / write-back
Write to cache, async flush to DB. Fast writes, but data loss on crash before flush. Used for high-write workloads where some loss is acceptable.
Write-around
Write only to DB; never populate cache on write. Reads will miss first, then populate. Good when written data is rarely read soon.
Refresh-ahead
Cache proactively refreshes entries before TTL expires for hot keys. Avoids miss latency at the cost of work.
9.3 Cache invalidation
Phil Karlton: "There are only two hard things in computer science: cache invalidation and naming things."
Approaches:
TTL (time to live)
Simplest. Tolerate staleness up to T. Best when reads dominate and consistency is loose. Pick T based on user tolerance: 60s for "real-time" UIs; hours/days for static.
Explicit invalidation (on write)
On update, delete the cache key. Subsequent read repopulates. Simple to reason about; race conditions exist (write-then-invalidate vs read-then-set order).
Write-through
Update cache and DB together. Strong consistency, slowest path.
Tag-based / dependency-based
Each cache entry has tags; invalidating a tag clears all related entries. Used by Fastly for HTML pages.
Event-driven (CDC)
Change-data-capture from DB → invalidate cache. Decouples app code from invalidation logic. Used in modern multi-service architectures.
Versioning
Include a version number in the cache key. Bump version → all old entries become unreachable, no explicit deletion needed. Old entries age out naturally.
9.4 Eviction policies
When cache is full, what gets thrown out?
- LRU (Least Recently Used): default for most caches. Good general choice.
- LFU (Least Frequently Used): better for skewed traffic where some keys are perpetually hot.
- FIFO: simple, ignores access patterns. Not great.
- Random: surprisingly OK in practice, used by Redis for expiry sampling.
- TinyLFU / W-TinyLFU: modern policy that approximates LFU with a small frequency sketch + LRU window. Used in Caffeine (Java cache). Best general performance.
Redis has multiple modes: noeviction (refuse writes when full), allkeys-lru, volatile-lru (only keys with TTL), allkeys-lfu, volatile-ttl (evict closest to expiry).
9.5 Distributed cache: Redis vs Memcached
| Redis | Memcached | |
|---|---|---|
| Data types | strings, lists, sets, hashes, sorted sets, streams, bitmaps, HLL, geo | strings only |
| Persistence | optional (RDB, AOF) | none |
| Replication | yes (primary/replica, cluster) | no built-in |
| Threading | mostly single-threaded; multi-threaded I/O optional | multi-threaded |
| Memory efficiency | good, slightly more overhead | excellent for pure KV |
| Atomic ops | rich (INCR, transactions, Lua, sorted-set ops) | basic CAS, INCR |
| Pub/sub | yes | no |
| Cluster mode | hash slots, automatic sharding | client-side sharding (consistent hashing) |
| When to pick | rich data needs, durability, complex ops | pure key-value cache at extreme throughput |
Default to Redis unless you specifically need memcached's threading throughput on simple GET/SET. Most teams pick Redis and never look back.
9.6 Cache topology
Client-side / in-process
A LinkedHashMap-with-eviction in your app. Fastest (no network). Limited per-instance, no sharing. Good for: small reference data (feature flags, config).
Distributed cache (shared)
Redis/Memcached cluster. All app instances share one logical cache. Network hop, but consistent across the fleet. The default for caching in microservice architectures.
Multi-tier (L1 + L2)
In-process L1 (e.g., 10K most-popular keys) + distributed L2 (everything else). Reduces network calls for hot keys. Used by Facebook (mcrouter), Twitter, etc.
Cache-as-a-fan-out (Materialized cache)
Cache stores precomputed query results, not raw rows. Twitter's home timeline cache works like this — it stores computed timelines per user.
9.7 Hard problems
Thundering herd
Cache key expires; 1000 concurrent requests miss simultaneously; all hit DB. Fixes:
- Single-flight: only one request fetches; others wait.
- Stale-while-revalidate: serve stale value while one request refreshes.
- Probabilistic early expiration: a random fraction of requests refresh slightly before TTL, smoothing the spike.
- Locking: distributed lock on the key (Redis SETNX with TTL). Care with lock TTL ≥ refresh time.
Hot key
One key sees 100K QPS while others see 100. The single-shard cache is overwhelmed. Fixes:
- Replicate the hot key to multiple cache nodes; randomize fetch.
- Local cache the hot key on the app side.
- CDN for static content.
Cache stampede on cold start
Cache empty (deploy, restart, evict). DB drowns. Fix: warm the cache before serving traffic. Replay last hour's queries; precompute critical entries.
Cache penetration
Bad clients query nonexistent keys. Each miss hits DB. Fix: cache negative results (with shorter TTL) or use a Bloom filter to short-circuit known-missing keys.
Cache inconsistency
Cache and DB diverge. Sources:
- Race condition: read-through misses, fetches from DB, but a concurrent write happened — cache now has stale.
- Network failures: invalidation message lost.
Mitigations:
- Short TTLs as a backstop.
- Versioned keys.
- Avoid read-modify-write patterns; use atomic ops.
9.8 Cache key design
- Prefix everything by service/feature:
users:profile:123not123. Easier to inspect, shard, evict, debug. - Include version for safe deploys:
v2:users:profile:123. - Stable serialization: same query produces the same key. Sort query params; canonicalize.
- Hash long keys: SHA-1 or xxhash for keys > 256 chars.
9.9 Sizing
Capacity in cache should accommodate:
- Working set (items accessed in time window T).
- Plus headroom (say 20%) so eviction is light.
If your DB has 10 TB of data and your hot set is the last 24h (~100 GB), 100 GB cache covers ~99% of reads. The remaining cold reads always hit DB but are infrequent.
Hit ratio rule of thumb:
- 90%+ for read-heavy product surfaces.
- 99%+ for true edge-cacheable content (CDN).
If your hit rate is 70%, your cache is misconfigured (too small, wrong eviction, wrong key shape).
9.10 CDN caching (preview; chapter 24)
CDNs are the same patterns at the edge. Honor Cache-Control, ETag, Vary headers; cache by URL + selected headers; purge via API. The cardinal rule: don't put PII in CDN-cached responses.
9.11 Anti-patterns
- Caching everything by default. Cache only what's hot and expensive to compute.
- No TTL. Memory leak in slow motion.
- Cache as the source of truth. The DB is. Cache must be reconstructable.
- Synchronous cache writes on every request. Use async/lazy.
- Caching across security boundaries (caching PII at CDN, caching one user's data for another). Vary by user when data is per-user.
Key takeaways
- Caching is the biggest perf lever. Pick the layer closest to the user where it's safe.
- Cache-aside is the default; layer write-through or refresh-ahead when needed.
- Invalidation is hard — TTL + explicit + versioning are the working tools.
- Redis = default; Memcached when pure KV throughput dominates.
- Watch for thundering herd, hot keys, stampedes, penetration, inconsistency.
// 0 views