What:
High-speed in-memory caches (Redis, Memcached) deployed to intercept slow database read patterns before they hit durable disk storage.
Primary purpose:
Minimizing read query latencies, boosting system throughput, and offloading transactional load from databases.
Usually used for:
User session storage, precomputed feeds, heavy analytics, and API gateway rate limiters.
How should I think about this inside system architectures?
🔋 The Transient Cache Rule
Caches are NOT the source of truth. They must remain ephemeral and safely rebuildable from the database at any moment.
🎯 Lazy Cache-Aside
Check the cache first. On miss, query the database, populate the cache, and return. Delete keys on write to prevent stale states.
🔀 Decoupled Write-Back
Acknowledge writes instantly in-memory, queueing changes asynchronously to flush to PostgreSQL in background micro-batches.
Needed When:
Read-to-write ratios are massive (e.g. 99:1), database disk I/O bottlenecks, or sub-millisecond latencies are strict SLAs.
Avoids:
Database CPU exhaustion, slow point-lookup sequential scans, expensive duplicate calculations, and high cloud database hosting costs.
Optimizes For:
Sub-millisecond query response speeds, scalable read concurrency, and backend resource preservation.
In production systems, caching is established at every layer of the request journey to maximize system response speeds:
Core Caching Patterns Code Layout
Standard lazy-load Cache-Aside query and update paths executed in code:
Read:
data = cache.get(key)
if data is None:
data = db.query(key)
cache.set(key, data, ttl=300)
return data
Write:
db.update(key, data)
cache.delete(key) // Invalidate immediately to prevent stales- Lazy vs Aggressive Loading: Cache-Aside loads on demand; Write-Through preempts misses at the cost of write speeds.
- Caching Pattern Matrix: Evaluating read and write paths:
| Pattern | Pros | Cons |
|---|---|---|
| Cache-Aside (Lazy) | Cache contains only active hot keys; simple to code; node crash is degraded, not fatal | Cache miss penalty on first read; data can fall stale if not proactively deleted |
| Write-Through | Zero cache staleness; highly consistent; ideal for read-heavy key sets | Write latency increases (must write synchronously to both cache and database) |
| Write-Behind (Back) | Ultra-low write latency; DB write batching offloads high transactional disk I/O | Immediate risk of data loss if the in-memory cache nodes crash before syncing to disk |
- Eviction Policies: When memory hits limits, caches must select which keys to drop to prevent crash:
| Policy | Evicts | Primary Use Case |
|---|---|---|
| LRU (Least Recently) | Evicts items that have not been read for the longest time | Standard general-purpose caching (Redis allkeys-lru default) |
| LFU (Least Frequently) | Evicts items with the lowest access frequency counter | Ideal for static assets that must maintain permanent popularity |
| TTL (Time To Live) | Evicts keys immediately upon expiration of absolute epoch duration | Ephemerals, shopping carts, session tokens, or OTP entries |
Problem: A hot key (e.g. homepage_feed) expires. Hundreds of concurrent client threads all miss the cache at the exact same millisecond, crushing the primary database with duplicate queries.
Mitigation: Implement distributed locks (Mutex) around cache misses, or use Probabilistic Early Expiry (XFetch) to recompute values asynchronously before they decay.
Problem: Clients query non-existent keys (e.g. random UUIDs). The cache misses continuously, letting queries flow straight through to exhaust database capacity.
Mitigation: Deploy a **Bloom Filter** at the gateway to filter keys that definitely do not exist, or cache null values temporarily (`TTL = 30s`).
Problem: Multiple cache keys are generated at the exact same moment (e.g. system boot). They all expire simultaneously, producing an immediate database query spike.
Mitigation: Inject randomized **jitter** to TTL allocations (e.g. `TTL = base_duration + random(0, 30s)`).
| Invalidation Strategy | Action Path | Best Case Use |
|---|---|---|
| TTL-Based | Attach absolute expiration time window to cache keys | General session data |
| Event-Driven | Listen to CDC db change event → publish invalidate command to Redis | Strong consistency |
| Key Versioning | Append static version strings to route keys: user:100:v2 | Safe zero-lock rollouts |
- Your system design workload has highly unequal distribution (e.g., Pareto Zipfian 80/20 read hotspots).
- You are designing heavy read platforms like social feeds, user profiles, or product inventories.
- You face high point-lookup speeds that must return in sub-5 milliseconds.
- Bloom Filter: Space-efficient checks for non-existent cache keys.
- CDN & Edge Delivery: Caching static resources at the geographical network boundary.
- Redis Patterns: Specific in-memory data configurations and storage features.
The XFetch Probabilistic Cache Expiration Algorithm
To eliminate the Cache Stampede entirely without blocking thread pools, high-scale systems deploy **XFetch**. Instead of waiting for a key to expire to trigger recomputation, readers evaluate a probabilistic test during normal gets:
# Probabilistic early compute check
import random, math
def should_recompute(remaining_ttl, compute_time, beta=1.0):
# beta > 1.0 makes early recomputation more aggressive
return (remaining_ttl - (compute_time * beta * math.log(random.random()))) < 0If the test evaluates to `True`, the reader returns the current cached value immediately, but asynchronously kicks off a background thread to refresh the cache key from the database.
Cache Consistency Race Conditions
Under high concurrency, the Cache-Aside pattern (write: update DB, delete cache) has a rare but catastrophic race condition:
- Cache is empty. Client A queries key $K$; misses; queries DB and gets value $V1$.
- Client B updates key $K$ in DB to $V2$; deletes cache key (currently empty anyway).
- Client A (delayed by CPU scheduling) writes stale value $V1$ back into the cache.
- The cache now stores stale value $V1$ indefinitely until the next TTL or write.
Mitigation: Enforce short TTL values as a safety net, or use transactional locking layers (e.g., MySQL locks or Redis locks) during cache recomputation.
Multi-Layer Cache Stack
Caching is not only "Redis in front of Postgres." Production systems stack layers — each with different TTL, invalidation cost, and hit ratio:
- Client / browser cache: HTTP Cache-Control headers for static assets and idempotent GET responses — zero server load on repeat visits.
- CDN edge: Geo-distributed cache for images, video segments, and public API GETs with short TTL — see CDN & Edge Delivery.
- In-process (local) cache: Caffeine/Guava map inside each app pod — microsecond reads for config and hot keys; invalidated via pub/sub on change.
- Distributed cache (Redis/Memcached): Shared across pods; survives pod restarts; handles hot keys with replication or key splitting.
- Database buffer pool: OS/page cache and InnoDB buffer — last line before disk; not a substitute for application-level design.
Interview signal: walk the read path top-down — "CDN miss → Redis miss → read replica → primary" — and state where each layer adds value vs complexity.
Hot Key Mitigation
A viral post or celebrity profile concentrates reads on one cache key. A single Redis node serving that key becomes a bottleneck. Mitigations: replicate the hot key across Redis nodes with random read selection, local in-process caching on app servers for ultra-hot keys, or pre-warm CDN/edge caches before traffic spikes.