9. Caching: Strategies, Eviction, Redis vs Memcached

Caching is the highest-leverage optimization in system design. Memory is ~100,000× faster than disk seek; a cache that absorbs even 80% of reads is the difference between a melted database and a cool one.

9.1 Where caches live

A request travels through many layers. Each can cache.

Client → CDN → Reverse proxy → App → DB
  ↑       ↑          ↑          ↑    ↑
  L0     L1         L2         L3   L4

Layer	Examples	What it caches
L0 Browser	HTTP cache, IndexedDB, service workers	static assets, API responses
L1 CDN / Edge	CloudFront, Cloudflare, Fastly	static assets, edge-cacheable HTML/JSON
L2 Reverse proxy	Nginx, Varnish	popular endpoints
L3 App	in-process (LRU map), distributed (Redis)	computed results, sessions
L4 DB	buffer pool, query cache	hot pages

The further out (closer to user), the bigger the latency win. The closer to data, the easier the consistency.

9.2 Cache strategies (read patterns)

Cache-aside (lazy loading)

App is responsible.

read(key):
  v = cache.get(key)
  if v is None:
    v = db.get(key)
    cache.set(key, v, ttl=300)
  return v

Pros: simple, only cache what's read, cache failure non-fatal. Cons: cache miss = extra round trip; cold-start storms ("thundering herd").

Read-through

Cache itself loads on miss. App always asks the cache. Implementation lives in the cache library.

Write-through

Every write goes to cache and DB synchronously. Cache stays consistent. Slow writes.

Write-behind / write-back

Write to cache, async flush to DB. Fast writes, but data loss on crash before flush. Used for high-write workloads where some loss is acceptable.

Write-around

Write only to DB; never populate cache on write. Reads will miss first, then populate. Good when written data is rarely read soon.

Refresh-ahead

Cache proactively refreshes entries before TTL expires for hot keys. Avoids miss latency at the cost of work.

9.3 Cache invalidation

Phil Karlton: "There are only two hard things in computer science: cache invalidation and naming things."

Approaches:

TTL (time to live)

Simplest. Tolerate staleness up to T. Best when reads dominate and consistency is loose. Pick T based on user tolerance: 60s for "real-time" UIs; hours/days for static.

Explicit invalidation (on write)

On update, delete the cache key. Subsequent read repopulates. Simple to reason about; race conditions exist (write-then-invalidate vs read-then-set order).

Write-through

Update cache and DB together. Strong consistency, slowest path.

Tag-based / dependency-based

Each cache entry has tags; invalidating a tag clears all related entries. Used by Fastly for HTML pages.

Event-driven (CDC)

Change-data-capture from DB → invalidate cache. Decouples app code from invalidation logic. Used in modern multi-service architectures.

Versioning

Include a version number in the cache key. Bump version → all old entries become unreachable, no explicit deletion needed. Old entries age out naturally.

9.4 Eviction policies

When cache is full, what gets thrown out?

LRU (Least Recently Used): default for most caches. Good general choice.
LFU (Least Frequently Used): better for skewed traffic where some keys are perpetually hot.
FIFO: simple, ignores access patterns. Not great.
Random: surprisingly OK in practice, used by Redis for expiry sampling.
TinyLFU / W-TinyLFU: modern policy that approximates LFU with a small frequency sketch + LRU window. Used in Caffeine (Java cache). Best general performance.

Redis has multiple modes: noeviction (refuse writes when full), allkeys-lru, volatile-lru (only keys with TTL), allkeys-lfu, volatile-ttl (evict closest to expiry).

9.5 Distributed cache: Redis vs Memcached

	Redis	Memcached
Data types	strings, lists, sets, hashes, sorted sets, streams, bitmaps, HLL, geo	strings only
Persistence	optional (RDB, AOF)	none
Replication	yes (primary/replica, cluster)	no built-in
Threading	mostly single-threaded; multi-threaded I/O optional	multi-threaded
Memory efficiency	good, slightly more overhead	excellent for pure KV
Atomic ops	rich (INCR, transactions, Lua, sorted-set ops)	basic CAS, INCR
Pub/sub	yes	no
Cluster mode	hash slots, automatic sharding	client-side sharding (consistent hashing)
When to pick	rich data needs, durability, complex ops	pure key-value cache at extreme throughput

Default to Redis unless you specifically need memcached's threading throughput on simple GET/SET. Most teams pick Redis and never look back.

9.6 Cache topology

Client-side / in-process

A LinkedHashMap-with-eviction in your app. Fastest (no network). Limited per-instance, no sharing. Good for: small reference data (feature flags, config).

Distributed cache (shared)

Redis/Memcached cluster. All app instances share one logical cache. Network hop, but consistent across the fleet. The default for caching in microservice architectures.

Multi-tier (L1 + L2)

In-process L1 (e.g., 10K most-popular keys) + distributed L2 (everything else). Reduces network calls for hot keys. Used by Facebook (mcrouter), Twitter, etc.

Cache-as-a-fan-out (Materialized cache)

Cache stores precomputed query results, not raw rows. Twitter's home timeline cache works like this — it stores computed timelines per user.

9.7 Hard problems

Thundering herd

Cache key expires; 1000 concurrent requests miss simultaneously; all hit DB. Fixes:

Single-flight: only one request fetches; others wait.
Stale-while-revalidate: serve stale value while one request refreshes.
Probabilistic early expiration: a random fraction of requests refresh slightly before TTL, smoothing the spike.
Locking: distributed lock on the key (Redis SETNX with TTL). Care with lock TTL ≥ refresh time.

Hot key

One key sees 100K QPS while others see 100. The single-shard cache is overwhelmed. Fixes:

Replicate the hot key to multiple cache nodes; randomize fetch.
Local cache the hot key on the app side.
CDN for static content.

Cache stampede on cold start

Cache empty (deploy, restart, evict). DB drowns. Fix: warm the cache before serving traffic. Replay last hour's queries; precompute critical entries.

Cache penetration

Bad clients query nonexistent keys. Each miss hits DB. Fix: cache negative results (with shorter TTL) or use a Bloom filter to short-circuit known-missing keys.

Cache inconsistency

Cache and DB diverge. Sources:

Race condition: read-through misses, fetches from DB, but a concurrent write happened — cache now has stale.
Network failures: invalidation message lost.

Mitigations:

Short TTLs as a backstop.
Versioned keys.
Avoid read-modify-write patterns; use atomic ops.

9.8 Cache key design

Prefix everything by service/feature: users:profile:123 not 123. Easier to inspect, shard, evict, debug.
Include version for safe deploys: v2:users:profile:123.
Stable serialization: same query produces the same key. Sort query params; canonicalize.
Hash long keys: SHA-1 or xxhash for keys > 256 chars.

9.9 Sizing

Capacity in cache should accommodate:

Working set (items accessed in time window T).
Plus headroom (say 20%) so eviction is light.

If your DB has 10 TB of data and your hot set is the last 24h (~100 GB), 100 GB cache covers ~99% of reads. The remaining cold reads always hit DB but are infrequent.

Hit ratio rule of thumb:

90%+ for read-heavy product surfaces.
99%+ for true edge-cacheable content (CDN).

If your hit rate is 70%, your cache is misconfigured (too small, wrong eviction, wrong key shape).

9.10 CDN caching (preview; chapter 24)

CDNs are the same patterns at the edge. Honor Cache-Control, ETag, Vary headers; cache by URL + selected headers; purge via API. The cardinal rule: don't put PII in CDN-cached responses.

9.11 Anti-patterns

Caching everything by default. Cache only what's hot and expensive to compute.
No TTL. Memory leak in slow motion.
Cache as the source of truth. The DB is. Cache must be reconstructable.
Synchronous cache writes on every request. Use async/lazy.
Caching across security boundaries (caching PII at CDN, caching one user's data for another). Vary by user when data is per-user.

Key takeaways

Caching is the biggest perf lever. Pick the layer closest to the user where it's safe.
Cache-aside is the default; layer write-through or refresh-ahead when needed.
Invalidation is hard — TTL + explicit + versioning are the working tools.
Redis = default; Memcached when pure KV throughput dominates.
Watch for thundering herd, hot keys, stampedes, penetration, inconsistency.