What:
A catalog of recurring solution shapes and a delivery rhythm for 45-minute system design interviews.
Primary purpose:
Recognize which pattern applies, name it confidently, and allocate time so the interviewer sees requirements, architecture, and depth — not just one area.
Usually used for:
Product systems (feeds, chat, commerce), infra-adjacent designs (queues, storage), and any problem where the same building blocks reappear with different nouns.
Most interview problems reduce to one or two of these axes:
📡 Push vs Pull
Does the client need live updates? If yes, plan fan-out and connection management. If no, cache + pagination on pull is enough.
⚡ Sync vs Async
Can the user wait? Short requests stay on the API path. Anything over ~200 ms of CPU or I/O belongs on a queue with status polling or webhooks.
📈 Read vs Write Hot
Identify the hot dimension first. Read-heavy → replicas and CDN. Write-heavy → shard keys, batch writes, or aggregate counters.
Needed When:
You have 45 minutes and must show structured thinking — not a laundry list of technologies.
Avoids:
Spending 20 minutes on APIs with no diagram, or jumping to Kafka before clarifying scale and consistency needs.
Optimizes For:
Signal density — interviewers score pattern recognition, trade-off articulation, and communication under time pressure.
A typical interview design layers sync APIs, async workers, read scaling, and orchestration. Not every box appears in every problem — omit what the requirements do not need:
- Seven recurring patterns — map the problem statement to one or two before choosing databases:
| Pattern | Core Mechanic | Primary Role |
|---|---|---|
| Real-Time Updates | WebSocket, SSE, or long-poll with pub/sub fan-out to connected clients. | Push live state (chat, bids, dashboards) without polling every resource. |
| Long-Running Tasks | Enqueue work to a durable queue; stateless workers process asynchronously. | Keep API latency low for transcoding, email, reports, and batch jobs. |
| Contention Control | Distributed locks, compare-and-swap (CAS), or single-writer queues. | Prevent double-booking, overselling inventory, or duplicate payments. |
| Scaling Reads | Read replicas, layered cache, and CDN edge delivery. | Absorb read-heavy traffic without overloading primary databases. |
| Scaling Writes | Sharding, write batching, and pre-aggregation counters. | Spread write load and reduce hot-row pressure on a single node. |
| Large Blob Handling | Presigned multipart upload directly to object storage. | Offload multi-GB files from application servers and API gateways. |
| Multi-Step Processes | Saga compensations or workflow orchestrators (Temporal-style). | Coordinate checkout, booking, and onboarding across multiple services. |
| Benefit | Cost |
|---|---|
| Pattern reuse — once you recognize the shape (feed, chat, checkout), you spend less time inventing from scratch | Over-application — forcing WebSockets or sharding when a simple REST + cache design suffices loses credibility |
| Structured pacing — a time budget prevents drowning in API details before drawing architecture | Rigid scripts — interviewers may jump to deep dives early; adapt while keeping scope explicit |
Problem: A celebrity goes live and millions of clients open WebSocket connections to a single region. Connection memory and fan-out CPU saturate before application logic runs.
Mitigation: Regional connection gateways, connection limits per user, SSE for one-way feeds where bidirectional channels are unnecessary, and shard fan-out by topic or room ID.
Problem: A flash sale uses a global distributed lock on inventory rows. Lock wait queues grow; P99 latency spikes and checkout times out.
Mitigation: Pre-decrement counters in Redis with CAS, partition inventory by SKU shard, or serialize purchases per SKU via a single-partition queue instead of coarse global locks.
Problem: Clients start multipart uploads but never complete them. Incomplete parts accumulate storage cost and clutter lifecycle policies.
Mitigation: Short-lived presigned URLs, lifecycle rules to abort incomplete uploads after 24 hours, and server-side finalize webhooks that validate checksum before marking the object visible.
| Interview Phase | Time Budget | What to Cover |
|---|---|---|
| Requirements & scope | ~5 min | Functional vs non-functional, scale assumptions, in/out of scope |
| Entities & relationships | ~2 min | Core nouns, ownership boundaries, read vs write paths |
| API surface | ~5 min | Key endpoints, idempotency keys, pagination, error contracts |
| High-level design | ~15 min | Boxes-and-arrows diagram, data flow, bottleneck callouts |
| Deep dives | ~10 min | Interviewer-chosen topics: sharding, fan-out, failure modes |
| Buffer / trade-offs | ~8 min | Explicit trade-offs, evolution path, monitoring hooks |
Treat the table as a default — senior interviewers often allocate more time to deep dives if your HLD is crisp. Always leave ~2 minutes to summarize trade-offs and next evolution steps.
- Real-time: "Users see updates within seconds" → WebSocket/SSE + pub/sub fan-out.
- Async work: "Video processing takes minutes" → queue + workers + job status API.
- Contention: "Only one seat left" or "exactly once charge" → CAS, idempotency keys, or saga.
- Read scale: "Millions of reads, few writes" → cache-aside + read replicas + CDN.
- Write scale: "Billions of events per day" → shard by user/time, batch inserts, counter aggregation.
- Large files: "Upload 5 GB video" → presigned multipart to object storage.
- Multi-step: "Reserve, pay, ship — any step can fail" → saga or workflow engine.
- Network Protocols (HTTP, gRPC, WebSocket, DNS) — choosing the right transport for sync vs push workloads.
- Message Queues Fundamentals — backbone for async workers and decoupled fan-out.
- Distributed Transactions: 2PC vs Saga — consistency model for multi-step checkout and booking flows.
- Caching Patterns & Invalidation — first lever for read scaling before adding replicas.
- CDN & Edge Delivery — static and media read path at global scale.
- API Contract & Integration Design — error contracts, webhooks, and async job APIs at the gateway boundary.
Pattern Composition: A Live Auction Example
Real interviews rarely isolate a single pattern. An ad auction or flash sale typically composes four at once:
- Real-time bids arrive over WebSocket; a regional gateway publishes to a partitioned Kafka topic keyed by auction ID.
- Contention on the winning bid uses CAS in Redis — only increment if the new bid exceeds the current high by the minimum tick.
- Read scaling serves auction catalog pages from CDN + edge cache; bid history reads come from read replicas lagging ~100 ms behind the leader.
- Settlement after auction close triggers a saga: lock funds → record winner → notify loser wallets with compensating releases on failure.
Walking through this composition in ~3 minutes demonstrates pattern fluency without drawing every box — a strong signal at senior level.
Geographic Proximity Routing
Location-aware products — ride matching, food delivery, nearby friends, edge CDN selection — share a pattern distinct from generic read scaling:
- Index by geography: Store entities in geohash, S2, or H3 cells so "nearby drivers" becomes a bounded cell lookup, not a full table scan.
- Regional partitioning: Route users to the datacenter closest to their coordinates; cross-region queries only when the search radius spans boundaries.
- Freshness vs accuracy: Driver GPS updates every 3–5 seconds via WebSocket; matchmaking reads a slightly stale position cache — strong consistency on exact lat/lng is unnecessary, but results must refresh within one tick.
- Fallback hierarchy: Expand search radius or adjacent geohash cells when supply is thin — product logic layered on top of spatial indexing.
Name this pattern when problems mention Uber, Yelp, Tinder radius, or "find nearest X" — it combines real-time updates, spatial data structures, and regional sharding without over-sharding on day one.