Caching Patterns and Invalidation – System Design Core Concept

What:

High-speed in-memory caches (Redis, Memcached) deployed to intercept slow database read patterns before they hit durable disk storage.

Primary purpose:

Minimizing read query latencies, boosting system throughput, and offloading transactional load from databases.

Usually used for:

User session storage, precomputed feeds, heavy analytics, and API gateway rate limiters.

How should I think about this inside system architectures?

🔋 The Transient Cache Rule

Caches are NOT the source of truth. They must remain ephemeral and safely rebuildable from the database at any moment.

🎯 Lazy Cache-Aside

Check the cache first. On miss, query the database, populate the cache, and return. Delete keys on write to prevent stale states.

🔀 Decoupled Write-Back

Acknowledge writes instantly in-memory, queueing changes asynchronously to flush to PostgreSQL in background micro-batches.

Lazy vs Aggressive Loading: Cache-Aside loads on demand; Write-Through preempts misses at the cost of write speeds.
Caching Pattern Matrix: Evaluating read and write paths:

Pattern	Pros	Cons
Cache-Aside (Lazy)	Cache contains only active hot keys; simple to code; node crash is degraded, not fatal	Cache miss penalty on first read; data can fall stale if not proactively deleted
Write-Through	Zero cache staleness; highly consistent; ideal for read-heavy key sets	Write latency increases (must write synchronously to both cache and database)
Write-Behind (Back)	Ultra-low write latency; DB write batching offloads high transactional disk I/O	Immediate risk of data loss if the in-memory cache nodes crash before syncing to disk

Eviction Policies: When memory hits limits, caches must select which keys to drop to prevent crash:

Policy	Evicts	Primary Use Case
LRU (Least Recently)	Evicts items that have not been read for the longest time	Standard general-purpose caching (Redis `allkeys-lru` default)
LFU (Least Frequently)	Evicts items with the lowest access frequency counter	Ideal for static assets that must maintain permanent popularity
TTL (Time To Live)	Evicts keys immediately upon expiration of absolute epoch duration	Ephemerals, shopping carts, session tokens, or OTP entries

Invalidation Strategy	Action Path	Best Case Use
TTL-Based	Attach absolute expiration time window to cache keys	General session data
Event-Driven	Listen to CDC db change event → publish invalidate command to Redis	Strong consistency
Key Versioning	Append static version strings to route keys: `user:100:v2`	Safe zero-lock rollouts

The XFetch Probabilistic Cache Expiration Algorithm

To eliminate the Cache Stampede entirely without blocking thread pools, high-scale systems deploy **XFetch**. Instead of waiting for a key to expire to trigger recomputation, readers evaluate a probabilistic test during normal gets:

PYTHON

# Probabilistic early compute check
import random, math

def should_recompute(remaining_ttl, compute_time, beta=1.0):
    # beta > 1.0 makes early recomputation more aggressive
    return (remaining_ttl - (compute_time * beta * math.log(random.random()))) < 0

If the test evaluates to `True`, the reader returns the current cached value immediately, but asynchronously kicks off a background thread to refresh the cache key from the database.

Cache Consistency Race Conditions

Under high concurrency, the Cache-Aside pattern (write: update DB, delete cache) has a rare but catastrophic race condition:

Cache is empty. Client A queries key $K$; misses; queries DB and gets value $V1$.
Client B updates key $K$ in DB to $V2$; deletes cache key (currently empty anyway).
Client A (delayed by CPU scheduling) writes stale value $V1$ back into the cache.
The cache now stores stale value $V1$ indefinitely until the next TTL or write.

Mitigation: Enforce short TTL values as a safety net, or use transactional locking layers (e.g., MySQL locks or Redis locks) during cache recomputation.

Multi-Layer Cache Stack

Caching is not only "Redis in front of Postgres." Production systems stack layers — each with different TTL, invalidation cost, and hit ratio:

Client / browser cache: HTTP Cache-Control headers for static assets and idempotent GET responses — zero server load on repeat visits.
CDN edge: Geo-distributed cache for images, video segments, and public API GETs with short TTL — see CDN & Edge Delivery.
In-process (local) cache: Caffeine/Guava map inside each app pod — microsecond reads for config and hot keys; invalidated via pub/sub on change.
Distributed cache (Redis/Memcached): Shared across pods; survives pod restarts; handles hot keys with replication or key splitting.
Database buffer pool: OS/page cache and InnoDB buffer — last line before disk; not a substitute for application-level design.

Interview signal: walk the read path top-down — "CDN miss → Redis miss → read replica → primary" — and state where each layer adds value vs complexity.

Hot Key Mitigation

A viral post or celebrity profile concentrates reads on one cache key. A single Redis node serving that key becomes a bottleneck. Mitigations: replicate the hot key across Redis nodes with random read selection, local in-process caching on app servers for ultra-hot keys, or pre-warm CDN/edge caches before traffic spikes.