This problem appears in multiple sheets. Depth expectations increase as you progress:
Interview Prompt
Design Flash Sale System.
Clarifying Questions (ask before designing)
| Question | Why it matters |
|---|---|
| Which of these is highest priority: Inventory pre-warming, Request queuing, Atomic decrement? | Forces scope negotiation — senior candidates trim before drawing boxes. |
| What scale should we design for — DAU, QPS, data volume? | Drives every capacity decision; shows structured thinking. |
| What are the read vs write patterns on the critical path? | Determines caching, DB choice, and replication topology. |
| What consistency and durability guarantees are required? | Separates strong-consistency paths from eventual ones — a senior differentiator. |
Scope
In scope
- Inventory pre-warming
- Request queuing
- Atomic decrement
- Fairness under extreme write contention
- Graceful degradation
- Capacity estimation with shown math
Out of scope (state explicitly)
- Recommendation engine (#48)
- Review/rating system (#70)
- Warehouse management (WMS) internals
Assumptions
- Clarify scale (DAU, QPS, data volume) for flash sale system in the first 5 minutes
- Standard reliability target 99.9%–99.99% unless problem implies higher (payments, booking)
- Managed cloud services (RDS, S3, Kafka, Redis) are acceptable building blocks
These foundational concepts underpin the patterns used in this problem. Review them before deep-diving into component-level trade-offs.
- Scheduled sales: Admin schedules flash sale with start/end time, SKUs, quantities, discounted prices
- Countdown timer: Show countdown to sale start; reveal items at exact start time
- Atomic purchase: Click Buy, atomically decrement stock, reserve for payment
- Virtual queue: Enqueue users with position if traffic exceeds capacity
- Purchase limits: Max 1-2 units per user per SKU (prevent scalpers)
- Real-time stock display: Show remaining stock count
- Fairness: First-come-first-served; no advantage from refreshing
- Anti-bot: Prevent automated bots from sniping all stock
- Extreme Throughput: Handle 1M+ concurrent users at T=0
- Low Latency: Purchase decision in < 100 ms
- Strong Consistency: No overselling: if 1,000 units, exactly 1,000 orders max
- Availability: Graceful degradation under extreme load
- Fairness: FIFO ordering guaranteed
- Idempotent: Double-click must not create two orders
| Metric | Calculation | Value |
|---|---|---|
| Concurrent users at sale start | Given | 1M+ |
| Purchase attempts / sec (T=0) | Derived from daily volume ÷ 86400 (+ peak factor) | 500K |
| Items for sale | Given | 1,000 - 10,000 units |
| Time to sell out | Given | 5-30 seconds |
| Page load requests / sec | Derived from daily volume ÷ 86400 (+ peak factor) | 2M (pre-sale refresh storm) |
| Bot traffic ratio | Given | 30-50% of requests |
The Critical Purchase Path: Redis Lua Script
The entire purchase decision must be atomic and < 1 ms:
-- Keys: flash_stock:{sale_id}:{sku_id}, user_limit:{sale_id}:{user_id}
-- Args: user_id, quantity
-- Step 1: Check per-user purchase limit
local user_purchased = redis.call('GET', KEYS[2])
if user_purchased and tonumber(user_purchased) >= 2 then
return {0, 'LIMIT_EXCEEDED'}
end
-- Step 2: Atomic stock decrement
local remaining = redis.call('DECRBY', KEYS[1], ARGV[2])
if remaining < 0 then
redis.call('INCRBY', KEYS[1], ARGV[2]) -- undo
return {0, 'SOLD_OUT'}
end
-- Step 3: Record user purchase count
redis.call('INCRBY', KEYS[2], ARGV[2])
redis.call('EXPIRE', KEYS[2], 86400)
return {1, remaining}Why Lua script and not separate commands? Entire script executes atomically in Redis. Separate DECR + check has a race window. 500K attempts/sec handled by single Redis node (single-threaded serialization).
Virtual Queue: Handling 1M Concurrent Users
Problem: 1M users hit Buy at T=0. Even if Redis can handle it, API servers and load balancers collapse under 500K concurrent TCP connections.
Solution: Virtual Queue (controlled admission)
T-5 min: Users "Enter Queue" early
Position assigned: INCR queue_position:{sale_id} --> position 347,231
User shown: "Your position: 347,231. Estimated wait: ~5 minutes"
T=0: Sale starts. Queue processes users FIFO.
Gate rate: 10,000 users admitted per second (tunable)
Admitted users:
1. Receive short-lived JWT token (valid 60 seconds)
2. Token authorizes call to purchase API
3. Purchase API validates token --> runs Lua script on Redis
Users not yet admitted:
- See "Please wait..." with live position via WebSocket/SSE
- Position updates every 5 seconds
Stock gone:
- Broadcast SOLD_OUT to ALL remaining queue members immediately
- Don't make users wait if nothing left to buy
Queue implementation:
Redis sorted set: ZADD queue:{sale_id} {timestamp} {user_id}
Processing: ZPOPMIN queue:{sale_id} 10000 (pop 10K per second)Anti-Bot Measures
Layer 1: CDN/WAF (Cloudflare, AWS WAF) - Rate limit per IP: max 10 req/sec - Known bot signatures blocked - JavaScript challenge (bots can't execute JS) - TLS fingerprinting (JA3 hash) -- flag non-browser clients Layer 2: Queue Entry Validation - CAPTCHA at queue entry (invisible reCAPTCHA) - Device fingerprint (canvas hash, WebGL, screen resolution) - Account age check: accounts < 24 hours old --> blocked Layer 3: Purchase Validation - One purchase per user_id (Redis user_limit) - One purchase per device_fingerprint - One purchase per payment method Layer 4: Post-Purchase Fraud Detection - Multiple orders to same shipping address from different accounts --> cancel - Reseller pattern detection --> flag
Enter Queue
POST /api/v1/flash-sale/{sale_id}/enter-queue
{
"captcha_token": "recaptcha-response-token",
"device_fingerprint": "fp-hash-abc"
}
Response: 200 OK
{
"queue_position": 12345,
"estimated_wait_seconds": 120,
"queue_token": "qt-uuid"
}Purchase (After Admitted)
POST /api/v1/flash-sale/{sale_id}/purchase
Idempotency-Key: "purchase-user123-sale456"
Authorization: Bearer {admission_jwt}
{
"sku_id": "SKU-FLASH-1",
"quantity": 1
}
Response: 200 OK
{
"status": "reserved",
"reservation_token": "res-uuid",
"payment_deadline": "2026-03-14T11:10:00Z",
"remaining_stock": 423
}
OR { "status": "sold_out" }
OR { "status": "limit_exceeded" }Get Sale Status
GET /api/v1/flash-sale/{sale_id}/status
Response: 200 OK
{
"sale_id": "sale-456",
"status": "active",
"items": [
{"sku_id": "SKU-FLASH-1", "name": "iPhone 16", "flash_price": 499.00,
"original_price": 999.00, "total_stock": 1000, "remaining": 423}
]
}Common Error Responses
400 Bad Request: invalid input, missing fields, or malformed JSON 401 Unauthorized: missing or invalid auth token or API key 403 Forbidden: authenticated but insufficient permissions 404 Not Found: resource ID does not exist 409 Conflict: duplicate write or version conflict; retry with idempotency key 422 Unprocessable Entity: valid syntax but invalid business logic 429 Too Many Requests: rate limit exceeded; honor Retry-After header 500 Internal Error: unexpected server fault; retry with idempotency key 503 Service Unavailable: dependency down or overloaded; use exponential backoff 402 Payment Required: insufficient funds 502 Bad Gateway: payment provider timeout; poll status endpoint
Redis: Flash Sale State
flash_stock:{sale_id}:{sku_id} --> INT (atomic DECR)
user_limit:{sale_id}:{user_id} --> INT (max 2), TTL 86400
reservation:{token} --> JSON { user_id, sku_id, qty }, TTL 600
queue_position:{sale_id} --> INT (INCR for each entrant)
queue:{sale_id} --> Sorted Set { user_id: timestamp }
admission:{sale_id}:{user_id} --> "admitted", TTL 60
device:{sale_id}:{fingerprint} --> user_id, TTL 86400PostgreSQL: Durable Records
CREATE TABLE flash_sales (
sale_id UUID PRIMARY KEY,
name VARCHAR(255),
start_time TIMESTAMPTZ NOT NULL,
end_time TIMESTAMPTZ NOT NULL,
status ENUM('scheduled','active','ended','cancelled'),
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE flash_sale_items (
sale_id UUID NOT NULL,
sku_id VARCHAR(50) NOT NULL,
flash_price DECIMAL(10,2) NOT NULL,
original_price DECIMAL(10,2) NOT NULL,
total_stock INT NOT NULL,
sold_count INT DEFAULT 0,
PRIMARY KEY (sale_id, sku_id)
);
CREATE TABLE flash_sale_orders (
order_id UUID PRIMARY KEY,
sale_id UUID NOT NULL,
user_id UUID NOT NULL,
sku_id VARCHAR(50) NOT NULL,
quantity INT NOT NULL,
price DECIMAL(10,2),
status ENUM('reserved','paid','cancelled','expired'),
reservation_token VARCHAR(64),
created_at TIMESTAMPTZ DEFAULT NOW(),
INDEX idx_sale_user (sale_id, user_id)
);| Concern | Solution |
|---|---|
| Redis crash mid-sale | Redis Cluster + WAIT 1 for replica sync; safety buffer (load 990/1000); post-sale reconciliation |
| Overselling | Lua script is atomic; remaining < 0 check with undo via INCRBY |
| Payment timeout | Reservation TTL 10 min; expired reservations auto-INCR stock back |
| Double purchase | Idempotency key + user_limit in Lua script |
| 1M page loads | CDN pre-rendered static pages; zero origin load until Buy click |
| Bot sniping | Multi-layer: CAPTCHA, device fingerprint, rate limit, account age |
| Queue fairness | Sorted set with timestamp score: strict FIFO ordering |
Specific: Redis Data Loss Mid-Sale
If primary crashes, replica may miss last 1-2 sec of DECRs: up to 10-20 items oversold.
Mitigations:
- WAIT command:
WAIT 1 0after Lua script ensures at least 1 replica ACKs before responding (+1ms latency) - Safety buffer: For 1000-unit sale, load only 990 into Redis. Hold 10 as buffer for replication lag.
- Post-sale reconciliation: Count confirmed orders in PostgreSQL. If orders > total_stock, cancel excess (last-in-first-cancelled). Notify affected users within minutes.
Interview Walkthrough
- Frame the entire problem as a Contention pattern on finite inventory — the goal is protecting downstream services from a 1000× traffic spike at T=0.
- Lead with CDN and Edge Delivery: serve a static pre-rendered sale page so 1M page loads never touch origin.
- Gate purchase requests through a virtual queue that admission-controls at a calculated drain rate using Back-of-the-Envelope Estimation.
- Decrement inventory atomically in Redis via a Lua script — never hit PostgreSQL for stock checks during the sale window.
- Apply Circuit Breaker and Retries and Bulkheads on the checkout path so a slow payment provider does not cascade into total failure.
- Plan for oversell: replication lag can lose 1–2 seconds of decrements — use a safety buffer and post-sale reconciliation to cancel excess orders.
- Reject early clicks with server-side
start_timevalidation regardless of client clock skew. - Common pitfall: letting all 1M users hit the inventory API simultaneously without queue gating or edge caching.
Pre-Rendered Static Pages: Surviving the Traffic Spike
CDN serves static sale page to 1M users with zero origin load. JS countdown is client-side. At T=0 JS reveals Buy button. Only purchase API calls hit origin (gated by virtual queue to ~10K/sec). Without this: 1M simultaneous requests = origin crash.
Countdown Precision
Fetch server time on page load once, compute client clock drift, adjust countdown. Server-side gate rejects any Buy request before sale.start_time regardless of client clock. Ultimate safeguard against clock skew.
Queue vs No Queue
No Queue: 500K req/sec all hit API + Redis directly. Redis handles it. BUT: API servers, load balancers, network with 500K concurrent connections --> likely crash. Result: errors, unfair (network latency determines winners). Virtual Queue (recommended): All users enqueued. Admitted at controlled rate (10K/sec). API servers handle 10K/sec comfortably. Result: fair (FIFO), stable (no crashes), users wait 30-60s in queue. Trade-off: slight wait for fairness and stability.
Why Not Just Auto-Scale?
Auto-scaling takes 2-5 minutes to spin up new instances. Flash sale goes from 0 to 500K req/sec in < 1 second. By the time scaling kicks in, sale is OVER (sold out in 5-30 sec). Solution: Pre-provision servers 24h before + CDN for page loads + Queue for API gating + Redis for stock (single node, no scaling needed).
Staff interviews expect you to articulate how the system evolves under real growth — not jump straight to the final architecture.
Phase 1: MVP (0 to 100K users)
Monolith or minimal services proving core flash sale system flows. Optimize for shipping speed and correctness over scale.
Key components: Single region · Primary DB + Redis cache · Synchronous core path · Basic monitoring
Move to next phase when: p99 latency exceeds SLO or DB CPU sustained above 70%
Phase 2: Growth (100K to 10M users)
Split read/write paths, introduce async processing for non-critical work, add caching layers and horizontal scaling.
Key components: Read replicas or CQRS · Message queue for async work · CDN / edge caching · Service-level SLOs
Move to next phase when: Hot keys, fan-out bottlenecks, or ops toil from manual scaling
Phase 3: Scale (10M+ users)
Shard data plane, multi-region active-active or active-passive, formal DR runbooks, cost optimization.
Key components: Database sharding / partitioning · Multi-region replication · Auto-scaling + chaos testing · Dedicated platform/SRE ownership
Move to next phase when: Regional failure domain risk, compliance data residency, or linear cost growth unsustainable
SLOs & Error Budgets
| Metric | Target | Rationale |
|---|---|---|
| Core user-facing availability | 99.95% | Budget for planned maintenance + unplanned failures without user-visible outage. |
| p99 latency (critical path) | Problem-specific — state target early and tie to capacity math | Interview credibility comes from connecting SLO to architecture choices. |
| Error rate (5xx) | < 0.1% | Distinguishes transient blips from systemic failure requiring rollback. |
| Data durability | 99.999999999% (11 nines) for committed writes | Define which operations require fsync/quorum vs async replication. |
Incident Scenarios (2am reality)
| Scenario | How you detect | Mitigation |
|---|---|---|
| Primary database unavailable | Health check failures, connection pool exhaustion alerts, elevated 5xx | Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists |
| Traffic spike (10× normal) | RPS anomaly alert, autoscaling lag, latency SLO burn rate | Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations |
| Bad deploy causing elevated errors | Canary metric regression, error budget burn, deployment correlation | Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility |
Cost Drivers (Staff lens)
- Egress bandwidth and CDN (often dominates media/data-heavy systems)
- Database storage + IOPS at scale (plan compaction, TTL, tiering)
- Compute for async pipelines (right-size workers, spot instances for batch)
- Managed service premiums vs operational headcount trade-off
Multi-Region & DR
Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.