This problem appears in multiple sheets. Depth expectations increase as you progress:
| Track | What to demonstrate |
|---|---|
| Arch 75 | Staff level: multi-region, cost at scale, migration path, and production metrics. |
Interview Prompt
Design Live Streaming Platform like Twitch.
Clarifying Questions (ask before designing)
| Question | Why it matters |
|---|---|
| Which of these is highest priority: Ingest → transcode → distribute pipeline, WebRTC vs RTMP, Low-latency HLS? | Forces scope negotiation — senior candidates trim before drawing boxes. |
| What scale should we design for — DAU, QPS, data volume? | Drives every capacity decision; shows structured thinking. |
| What are the read vs write patterns on the critical path? | Determines caching, DB choice, and replication topology. |
| What consistency and durability guarantees are required? | Separates strong-consistency paths from eventual ones — a senior differentiator. |
Scope
In scope
- Ingest → transcode → distribute pipeline
- WebRTC vs RTMP
- Low-latency HLS
- Chat overlay at scale
- DVR functionality
Out of scope (state explicitly)
- Recommendation / home feed ranking (#48, #65)
- Live chat and comments (#36)
- DRM license server internals
Assumptions
- Clarify scale (DAU, QPS, data volume) for live streaming platform in the first 5 minutes
- Standard reliability target 99.9%–99.99% unless problem implies higher (payments, booking)
- Managed cloud services (RDS, S3, Kafka, Redis) are acceptable building blocks
These foundational concepts underpin the patterns used in this problem. Review them before deep-diving into component-level trade-offs.
- Go live: Streamers broadcast live video/audio from OBS, mobile app, or webcam
- Watch live: Viewers watch live streams with minimal delay (< 5 seconds glass-to-glass latency)
- Live chat: Real-time chat alongside the stream (thousands of messages/sec for popular streams)
- Stream discovery: Browse by category/game, recommended streams, search
- Follow/Subscribe: Follow streamers for notifications; paid subscriptions for perks
- VOD: Automatically save past broadcasts for on-demand viewing
- Clips: Viewers create short clips (30-60 sec) from live streams
- Emotes: Custom emoji/emotes per channel (subscriber-only emotes)
- Stream quality: Adaptive bitrate: viewer selects or auto-adjusts quality
- Raids/Hosts: Streamer redirects their audience to another channel
- Moderation: Chat moderation tools (ban, timeout, slow mode, subscriber-only chat)
- Monetization: Subscriptions, bits/donations, ads
- Low Latency: Glass-to-glass latency < 5 seconds (standard); < 2 seconds (low-latency mode)
- Scale: Support 100K+ concurrent streams; 50M+ concurrent viewers globally
- Reliability: Stream must not drop: even momentary interruption loses viewers
- Chat Performance: Handle 100K+ messages/sec across popular channels
- Availability: 99.99%
- Global Distribution: Low-latency viewing from any country
- Cost Efficient: Video bandwidth is the #1 cost center; optimize CDN usage
- DVR: Viewers can rewind live streams up to 2 hours
| Metric | Calculation | Value |
|---|---|---|
| Concurrent streamers | Given (peak load assumption) | 100K |
| Concurrent viewers | Given (peak load assumption) | 50M |
| Avg viewers per stream | Given (typical workload assumption) | 500 (power law: top 1% have 100K+) |
| Ingest bandwidth | 100K streams × 6 Mbps | 600 Gbps |
| Egress bandwidth | 50M viewers × 4 Mbps avg | 200 Tbps |
| Chat messages / sec | From Chat messages / day ÷ 86400 (+ peak factor in value) | 500K (global); top stream: 50K/sec |
| VOD storage / day | 100K streams × 4 hrs avg × 2 GB/hr | 800 TB |
| Latency target | Given (assumption documented in value) | < 5 seconds (standard HLS); < 2 seconds (LL-HLS) |
High-level architecture of the live streaming platform, from streamer ingest through CDN delivery to viewers, with the chat system running in parallel.
Ingest: Receiving the Live Stream
Streamer uses OBS Studio to broadcast via RTMP (Real-Time Messaging Protocol) to the ingest server. The stream key is unique per channel and acts as authentication. Typical settings: 1080p 60fps, 6 Mbps bitrate, x264 encoder, keyframe interval 2s.
The RTMP Ingest Server receives the RTMP stream, decodes it, and validates the stream key (Redis lookup), bitrate limits, and correct keyframe interval. If valid, it forwards raw frames to the transcoding cluster.
RTMP vs SRT vs WebRTC for Ingest
| Protocol | Pros | Cons |
|---|---|---|
| RTMP | Universal, mature, every streaming software supports it | TCP-based (higher latency), no built-in encryption, technically deprecated |
| SRT ⭐ | UDP-based (lower latency), built-in AES encryption, forward error correction | Less universal than RTMP (growing adoption) |
| WebRTC | No software needed (browser), ultra-low latency (< 500ms) | Complex at scale (STUN/TURN), quality/bitrate constraints |
Twitch uses RTMP for ingest (compatibility) + internal SRT transport. Future: migrating to SRT or QUIC-based protocols.
Real-Time Transcoding: The Hardest Part
Unlike VOD transcoding, live transcoding must be real-time. Each second of video must be transcoded in < 1 second (otherwise latency accumulates).
Pipeline per stream
- Receive raw frames from ingest (1080p 60fps = 60 frames/sec)
- Transcode to multiple qualities IN PARALLEL:
| Quality | Resolution | FPS | Bitrate | Encoder |
|---|---|---|---|---|
| Source | 1080p | 60 | 6 Mbps | passthrough |
| High | 720p | 60 | 3 Mbps | x264/NVENC |
| Medium | 480p | 30 | 1.5 Mbps | x264/NVENC |
| Low | 360p | 30 | 0.8 Mbps | x264 |
| Audio only | — | — | 128 Kbps | AAC |
Real-time constraint: 1 second of 1080p 60fps must encode 60 frames across 4 qualities. x264 (CPU) takes ~5ms per frame → 60 × 4 = 1.2s (too slow sequentially). Solution: Parallel encoding: each quality on its own CPU core/GPU stream. GPU (NVENC): ~1ms per frame → 240ms total. At 100K concurrent streams: GPU approach needs ~17K GPUs vs 50K servers for CPU. GPU is 3× more cost-effective for live encoding.
HLS Segment Generation
Every 2 seconds, the encoded stream is cut into a segment (segment_000001.ts). The live playlist is updated with a sliding window of the last 3-5 segments. Clients poll the playlist every 1-2 seconds, discover new segments, and download them.
Low-Latency HLS (LL-HLS): Getting Below 3 Seconds
Standard HLS latency breakdown: Encoder buffer (2s) + Segment duration (6s) + Player buffer (18s) + CDN propagation (1s) = ~25-30 seconds.
How to reduce:
- Shorter segments (2s instead of 6): Reduces segment wait from 6s → 2s, but more HTTP requests and less CDN cache efficiency.
- LL-HLS with Partial Segments ⭐: Push 200ms "partial" segments instead of waiting for full 2-second segments. Client downloads partials as produced. Latency: encoder buffer (2s) + partial (0.2s) + CDN (0.5s) + player (0.5s) = ~3 seconds.
- HTTP/2 Server Push: Server pushes new partials without client polling.
- Preload hints: Client pre-connects and waits for the next partial before it's ready.
| Method | Latency |
|---|---|
| Standard HLS | 25-30 seconds |
| Short segments | 6-10 seconds |
| LL-HLS ⭐ | 2-5 seconds |
| WebRTC | < 1 second (but doesn't scale) |
Chat System: 50K Messages/Second Per Channel
Architecture: Client → WebSocket → Chat Gateway → Kafka → Chat Processor (Flink) → Fan-out.
Chat Gateway (WebSocket servers): Each server handles 100K WebSocket connections. 50M viewers → 500 WS servers. Viewer sends message → WS server validates → publishes to Kafka topic chat-messages keyed by channel_id.
Chat Processor (Flink) handles: rate limiting (max 1 msg per 1.5s per user), spam filter, banned words, subscriber checks, and emote parsing.
Fan-out: The Scaling Challenge: 200K viewers × 50K msgs/sec = 10B pushes/sec (impossible). Solutions:
- Message batching: Batch messages per 100ms window → 1 batch per 100ms.
- Message sampling: Show only 20 msgs/sec to each viewer (randomly sampled); highlighted messages always shown. Result: 200K × 20 = 4M pushes/sec (manageable).
- Redis Pub/Sub: Each WS server subscribes to Redis Pub/Sub for channels its clients watch. 1 Redis publish → 500 server deliveries → each delivers to ~400 local clients.
- Sharding hot channels: Top 10 channels get dedicated chat infrastructure.
VOD: Automatic Stream Archive
During live stream: HLS segments are copied to S3 "vod-archive" bucket. After stream ends: generate complete HLS manifest, DASH manifest, thumbnail, optionally split into chapters, run content moderation, and make available for on-demand playback.
Storage: 800 TB/day. Retention: 60 days (Twitch standard). Cost optimization: first 7 days on S3 Standard, 7-60 days on S3 Infrequent Access (50% cheaper).
Clips: Viewer clicks "Clip" → capture last 60 seconds from HLS segments → concatenate into MP4 → transcode to clip format → store permanently.
Event Bus Design (Kafka)
Topic: live_streaming_platform-events Partitions: 64 (scale consumers horizontally) Partition key: entity_id (user_id / order_id — preserves per-entity ordering) Retention: 7 days (compliance) or 24h (high-volume telemetry) Replication factor: 3, min.insync.replicas: 2 Producer: idempotent producer enabled (enable.idempotence=true) Consumer: consumer group "live_streaming_platform-processors" - At-least-once delivery + idempotent handlers (dedup by event_id) - DLQ topic: live_streaming_platform-events-dlq (poison messages after 3 retries) - Lag alert: consumer lag > 60s → scale workers Design a Live Streaming Platform like Twitch: async side effects MUST NOT block the synchronous API response. Sync path: validate → persist source of truth → publish event → return 201 Async path: consumers update caches, indexes, notifications, aggregates
Start Stream
POST /api/v1/streams/start
{
"channel_id": "ch-uuid",
"title": "Friday Night Gaming",
"category": "Fortnite",
"tags": ["English", "Competitive"],
"language": "en"
}
Response: 200 OK
{
"stream_id": "stream-uuid",
"ingest_url": "rtmp://ingest-us-east.example.com/live",
"stream_key": "live_sk_abc123def456",
"recommended_settings": {
"resolution": "1920x1080",
"fps": 60,
"bitrate": "6000 kbps",
"keyframe_interval": 2,
"encoder": "x264",
"rate_control": "CBR"
}
}Get Live Stream (Viewer)
GET /api/v1/streams/{channel_id}/live
Response: 200 OK
{
"stream_id": "stream-uuid",
"channel": {"name": "Ninja", "avatar": "..."},
"title": "Friday Night Gaming",
"viewer_count": 145230,
"started_at": "2025-03-14T20:00:00Z",
"manifest_url": "https://cdn.example.com/live/stream-uuid/master.m3u8",
"chat_websocket": "wss://chat.example.com/ws/ch-uuid"
}Send Chat Message
// WebSocket message
{
"type": "chat_message",
"channel_id": "ch-uuid",
"content": "PogChamp that was insane!"
}
// Server broadcast
{
"type": "chat_message",
"id": "msg-uuid",
"user": {"name": "viewer123", "color": "#FF0000"},
"content": "PogChamp that was insane!",
"timestamp": "2025-03-14T20:15:23Z"
}Create Clip
POST /api/v1/clips
{
"stream_id": "stream-uuid",
"title": "Insane play!",
"duration_seconds": 30,
"offset_seconds": -30
}
Response: 201 Created
{
"clip_id": "clip-uuid",
"url": "https://cdn.example.com/clips/clip-uuid.mp4",
"status": "processing"
}Browse Streams
GET /api/v1/streams?category=gaming&sort=viewers_desc&limit=20Common Error Responses
400 Bad Request: invalid input, missing fields, or malformed JSON
401 Unauthorized: missing or invalid auth token or API key
403 Forbidden: authenticated but insufficient permissions
404 Not Found: resource ID does not exist
409 Conflict: duplicate write or version conflict; retry with idempotency key
422 Unprocessable Entity: valid syntax but invalid business logic
429 Too Many Requests: rate limit exceeded; honor Retry-After header
500 Internal Error: unexpected server fault; retry with idempotency key
503 Service Unavailable: dependency down or overloaded; use exponential backoff
440 Login Timeout: WebSocket session expired; reconnect required
202 Accepted: job queued; poll GET /jobs/{id} for status
408 Request Timeout: job still processing; continue pollingMySQL: Stream & Channel Metadata
CREATE TABLE channels (
channel_id UUID PRIMARY KEY,
user_id UUID NOT NULL UNIQUE,
name VARCHAR(50) UNIQUE NOT NULL,
display_name VARCHAR(50),
description TEXT,
avatar_url TEXT,
banner_url TEXT,
stream_key VARCHAR(64) NOT NULL,
follower_count INT DEFAULT 0,
subscriber_count INT DEFAULT 0,
partner BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP
);
CREATE TABLE streams (
stream_id UUID PRIMARY KEY,
channel_id UUID NOT NULL,
title VARCHAR(255),
category_id INT,
language CHAR(2),
status ENUM('live', 'ended') DEFAULT 'live',
viewer_count INT DEFAULT 0,
peak_viewers INT DEFAULT 0,
started_at TIMESTAMP,
ended_at TIMESTAMP,
vod_url TEXT,
duration_seconds INT,
INDEX idx_channel (channel_id, started_at DESC),
INDEX idx_category_live (category_id, status, viewer_count DESC),
INDEX idx_live_viewers (status, viewer_count DESC)
);Redis: Live State
stream:{channel_id} → Hash { stream_id, title, category, started_at, ingest_server, manifest_url, status }
viewers:{stream_id} → INT (INCR/DECR on connect/disconnect)
stream_key:{key} → channel_id
chat_rate:{user_id}:{channel_id} → INT (INCR, check < 1 per 1.5 sec)
chat_banned:{channel_id} → SET of user_ids
live_streams:{category} → Sorted Set { channel_id: viewer_count }Kafka Topics
Topic: stream-events (stream started, ended, title changed) Topic: chat-messages (partitioned by channel_id for ordering) Topic: viewer-events (join, leave — for viewer count) Topic: clip-requests (async clip generation) Topic: moderation-events (ban, timeout, message delete)
S3: Video Storage
Bucket: live-segments (short retention, 48 hours)
/{stream_id}/720p/segment_000001.ts
/{stream_id}/1080p/segment_000001.ts
/{stream_id}/master.m3u8
Bucket: vod-archive (60-day retention)
/{stream_id}/vod/master.m3u8
/{stream_id}/vod/720p/segment_000001.ts
Bucket: clips (permanent)
/{clip_id}/clip.mp4| Concern | Solution |
|---|---|
| Ingest server crash | Streamer's software auto-reconnects to backup ingest server; 2-5 sec gap |
| Transcoder crash | Hot standby transcoder per stream; failover in < 1 second |
| CDN edge failure | CDN auto-routes to next nearest PoP; viewer sees brief rebuffer |
| Origin storage failure | S3 11 nines durability; origin shield with multiple backends |
| Chat server crash | WS reconnect → new server; message history replayed from Kafka |
| Viewer count drift | Periodic reconciliation: count actual WS connections per stream |
| Stream key leaked | Streamer can regenerate stream key instantly; invalidate old key |
Handling a Stream That Goes Viral (1K → 500K Viewers in Minutes)
CDN cache: Before viral, segments cached at 2-3 PoPs. After viral, viewers from 100+ PoPs. Solution: origin shield (absorbs 90% edge misses), CDN pre-push to all PoPs when viewer count > 50K, and multi-origin replication.
Transcoding: Migrate to dedicated transcoder (seamless, < 1s gap). Add more quality options for ABR.
Chat: Migrate channel to dedicated chat cluster, enable message sampling, add slow mode (1 msg per 5s per user).
Stream Latency Optimization Budget
End-to-end: ~2.3 seconds total. Each step optimized: encoder buffer (500ms), ingest PoP (200ms), GPU transcode (200ms), LL-HLS partial segment (200ms), CDN edge (500ms), player buffer (500ms), decode+render (100ms). Trade-off: Lower latency = smaller buffer = more rebuffering. Let viewer choose.
Interview Walkthrough
- Lead with the cost reality: bandwidth dominates — 50M viewers at 4 Mbps is hundreds of Tbps egress, so CDN architecture is the centerpiece.
- Walk the live path: RTMP ingest → GPU transcode (ABR ladder) → LL-HLS segment packaging → CDN edge with origin shield.
- Explain transcoder failover: hot standby per stream so a GPU crash causes <1 s gap, not a dead broadcast.
- Cover viewer count via periodic WebSocket server sweeps — not a central Redis counter that breaks on connection drops.
- Separate chat onto its own Kafka-backed cluster with slow mode and message sampling when a stream goes viral.
- Mention VOD capture: same ingest feeds a parallel recording pipeline for replay/clips after the stream ends.
- Common pitfall: trying WebRTC for mass delivery — it has no CDN support and cannot scale to millions of concurrent viewers.
HLS vs DASH vs WebRTC for Live Delivery
| Protocol | Pros | Cons |
|---|---|---|
| HLS | Universal (iOS, Android, all browsers), works with CDN, LL-HLS brings latency to 2-5s | Higher latency than WebRTC |
| DASH | Open standard (no Apple dependency), LL-DASH similar to LL-HLS | iOS Safari requires CMAF |
| WebRTC | Ultra-low latency (< 500ms) | P2P doesn't scale to millions, no CDN support, no DRM |
Industry moving to CMAF (unified format for HLS and DASH). Twitch uses a custom CMAF + chunked transfer encoding protocol achieving ~2-3 seconds end-to-end.
Viewer Count: Why It's Harder Than You Think
Approach 1: Central counter (Redis INCR/DECR): 50M viewers connecting/disconnecting causes millions of Redis ops/sec. WebSocket crashes inflate count (no DECR). Not recommended.
Approach 2: Periodic sweep ⭐ (Twitch's approach): Each WS server reports per-stream count every 10 seconds. Aggregator sums reports. Self-correcting (if server crashes, it stops reporting). 10-second staleness is acceptable ("145K viewers" doesn't need to be exact).
Approach 3: HyperLogLog: For unique viewer count (not concurrent). 12 KB per stream regardless of viewer count, ±0.81% error.
Cost Analysis: Bandwidth Is Everything
50M concurrent viewers × 4 Mbps avg = 200 Tbps egress. 200 Tbps ÷ 8 = 25 TB/s × 3600 = 90M GB/hour. At $0.03/GB CDN cost: ~$2.7M per hour. This is why Twitch operates at a loss on bandwidth alone.
Cost optimization: ABR pushes lower qualities (default to 720p), multi-CDN with volume discounts, own CDN infrastructure (Twitch/Amazon saves 50-70%), P2P CDN (experimental), and codec efficiency (HEVC/AV1 saves 30-50% bandwidth).
Transcoding: Dedicated Per-Stream vs Shared Pool
Dedicated: Isolated, predictable, but expensive and under-utilized.
Shared pool: Cost-efficient (6-8 streams per GPU), but noisy neighbor problem.
Hybrid ⭐ (Twitch): Tier 1 partners (> 1000 avg viewers) get dedicated. Tier 2 affiliates get shared pool (4 streams/instance). Tier 3 (< 10 viewers): source quality only (no transcoding).
Staff interviews expect you to articulate how the system evolves under real growth — not jump straight to the final architecture.
Phase 1: MVP (0 to 100K users)
Monolith or minimal services proving core live streaming platform flows. Optimize for shipping speed and correctness over scale.
Key components: Single region · Primary DB + Redis cache · Synchronous core path · Basic monitoring
Move to next phase when: p99 latency exceeds SLO or DB CPU sustained above 70%
Phase 2: Growth (100K to 10M users)
Split read/write paths, introduce async processing for non-critical work, add caching layers and horizontal scaling.
Key components: Read replicas or CQRS · Message queue for async work · CDN / edge caching · Service-level SLOs
Move to next phase when: Hot keys, fan-out bottlenecks, or ops toil from manual scaling
Phase 3: Scale (10M+ users)
Shard data plane, multi-region active-active or active-passive, formal DR runbooks, cost optimization.
Key components: Database sharding / partitioning · Multi-region replication · Auto-scaling + chaos testing · Dedicated platform/SRE ownership
Move to next phase when: Regional failure domain risk, compliance data residency, or linear cost growth unsustainable
SLOs & Error Budgets
| Metric | Target | Rationale |
|---|---|---|
| Core user-facing availability | 99.95% | Budget for planned maintenance + unplanned failures without user-visible outage. |
| p99 latency (critical path) | Problem-specific — state target early and tie to capacity math | Interview credibility comes from connecting SLO to architecture choices. |
| Error rate (5xx) | < 0.1% | Distinguishes transient blips from systemic failure requiring rollback. |
| Data durability | 99.999999999% (11 nines) for committed writes | Define which operations require fsync/quorum vs async replication. |
Incident Scenarios (2am reality)
| Scenario | How you detect | Mitigation |
|---|---|---|
| Primary database unavailable | Health check failures, connection pool exhaustion alerts, elevated 5xx | Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists |
| Traffic spike (10× normal) | RPS anomaly alert, autoscaling lag, latency SLO burn rate | Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations |
| Bad deploy causing elevated errors | Canary metric regression, error budget burn, deployment correlation | Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility |
Cost Drivers (Staff lens)
- Egress bandwidth and CDN (often dominates media/data-heavy systems)
- Database storage + IOPS at scale (plan compaction, TTL, tiering)
- Compute for async pipelines (right-size workers, spot instances for batch)
- Managed service premiums vs operational headcount trade-off
Multi-Region & DR
Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.