Design a Live Streaming Platform like Twitch

This problem appears in multiple sheets. Depth expectations increase as you progress:

Track	What to demonstrate
Arch 75	Staff level: multi-region, cost at scale, migration path, and production metrics.

Interview Prompt

Design Live Streaming Platform like Twitch.

Clarifying Questions (ask before designing)

Question	Why it matters
Which of these is highest priority: Ingest → transcode → distribute pipeline, WebRTC vs RTMP, Low-latency HLS?	Forces scope negotiation — senior candidates trim before drawing boxes.
What scale should we design for — DAU, QPS, data volume?	Drives every capacity decision; shows structured thinking.
What are the read vs write patterns on the critical path?	Determines caching, DB choice, and replication topology.
What consistency and durability guarantees are required?	Separates strong-consistency paths from eventual ones — a senior differentiator.

Scope

In scope

Ingest → transcode → distribute pipeline
WebRTC vs RTMP
Low-latency HLS
Chat overlay at scale
DVR functionality

Out of scope (state explicitly)

Recommendation / home feed ranking (#48, #65)
Live chat and comments (#36)
DRM license server internals

Assumptions

Clarify scale (DAU, QPS, data volume) for live streaming platform in the first 5 minutes
Standard reliability target 99.9%–99.99% unless problem implies higher (payments, booking)
Managed cloud services (RDS, S3, Kafka, Redis) are acceptable building blocks

Go live: Streamers broadcast live video/audio from OBS, mobile app, or webcam
Watch live: Viewers watch live streams with minimal delay (< 5 seconds glass-to-glass latency)
Live chat: Real-time chat alongside the stream (thousands of messages/sec for popular streams)
Stream discovery: Browse by category/game, recommended streams, search
Follow/Subscribe: Follow streamers for notifications; paid subscriptions for perks
VOD: Automatically save past broadcasts for on-demand viewing
Clips: Viewers create short clips (30-60 sec) from live streams
Emotes: Custom emoji/emotes per channel (subscriber-only emotes)
Stream quality: Adaptive bitrate: viewer selects or auto-adjusts quality
Raids/Hosts: Streamer redirects their audience to another channel
Moderation: Chat moderation tools (ban, timeout, slow mode, subscriber-only chat)
Monetization: Subscriptions, bits/donations, ads

Metric	Calculation	Value
Concurrent streamers	Given (peak load assumption)	100K
Concurrent viewers	Given (peak load assumption)	50M
Avg viewers per stream	Given (typical workload assumption)	500 (power law: top 1% have 100K+)
Ingest bandwidth	100K streams × 6 Mbps	600 Gbps
Egress bandwidth	50M viewers × 4 Mbps avg	200 Tbps
Chat messages / sec	From Chat messages / day ÷ 86400 (+ peak factor in value)	500K (global); top stream: 50K/sec
VOD storage / day	100K streams × 4 hrs avg × 2 GB/hr	800 TB
Latency target	Given (assumption documented in value)	< 5 seconds (standard HLS); < 2 seconds (LL-HLS)

High-level architecture of the live streaming platform, from streamer ingest through CDN delivery to viewers, with the chat system running in parallel.

Loading...

Ingest: Receiving the Live Stream

Streamer uses OBS Studio to broadcast via RTMP (Real-Time Messaging Protocol) to the ingest server. The stream key is unique per channel and acts as authentication. Typical settings: 1080p 60fps, 6 Mbps bitrate, x264 encoder, keyframe interval 2s.

The RTMP Ingest Server receives the RTMP stream, decodes it, and validates the stream key (Redis lookup), bitrate limits, and correct keyframe interval. If valid, it forwards raw frames to the transcoding cluster.

RTMP vs SRT vs WebRTC for Ingest

Protocol	Pros	Cons
RTMP	Universal, mature, every streaming software supports it	TCP-based (higher latency), no built-in encryption, technically deprecated
SRT ⭐	UDP-based (lower latency), built-in AES encryption, forward error correction	Less universal than RTMP (growing adoption)
WebRTC	No software needed (browser), ultra-low latency (< 500ms)	Complex at scale (STUN/TURN), quality/bitrate constraints

Twitch uses RTMP for ingest (compatibility) + internal SRT transport. Future: migrating to SRT or QUIC-based protocols.

Real-Time Transcoding: The Hardest Part

Unlike VOD transcoding, live transcoding must be real-time. Each second of video must be transcoded in < 1 second (otherwise latency accumulates).

Pipeline per stream

Receive raw frames from ingest (1080p 60fps = 60 frames/sec)
Transcode to multiple qualities IN PARALLEL:

Quality	Resolution	FPS	Bitrate	Encoder
Source	1080p	60	6 Mbps	passthrough
High	720p	60	3 Mbps	x264/NVENC
Medium	480p	30	1.5 Mbps	x264/NVENC
Low	360p	30	0.8 Mbps	x264
Audio only	—	—	128 Kbps	AAC

Real-time constraint: 1 second of 1080p 60fps must encode 60 frames across 4 qualities. x264 (CPU) takes ~5ms per frame → 60 × 4 = 1.2s (too slow sequentially). Solution: Parallel encoding: each quality on its own CPU core/GPU stream. GPU (NVENC): ~1ms per frame → 240ms total. At 100K concurrent streams: GPU approach needs ~17K GPUs vs 50K servers for CPU. GPU is 3× more cost-effective for live encoding.

HLS Segment Generation

Every 2 seconds, the encoded stream is cut into a segment (segment_000001.ts). The live playlist is updated with a sliding window of the last 3-5 segments. Clients poll the playlist every 1-2 seconds, discover new segments, and download them.

Low-Latency HLS (LL-HLS): Getting Below 3 Seconds

Standard HLS latency breakdown: Encoder buffer (2s) + Segment duration (6s) + Player buffer (18s) + CDN propagation (1s) = ~25-30 seconds.

How to reduce:

Shorter segments (2s instead of 6): Reduces segment wait from 6s → 2s, but more HTTP requests and less CDN cache efficiency.
LL-HLS with Partial Segments ⭐: Push 200ms "partial" segments instead of waiting for full 2-second segments. Client downloads partials as produced. Latency: encoder buffer (2s) + partial (0.2s) + CDN (0.5s) + player (0.5s) = ~3 seconds.
HTTP/2 Server Push: Server pushes new partials without client polling.
Preload hints: Client pre-connects and waits for the next partial before it's ready.

Method	Latency
Standard HLS	25-30 seconds
Short segments	6-10 seconds
LL-HLS ⭐	2-5 seconds
WebRTC	< 1 second (but doesn't scale)

Chat System: 50K Messages/Second Per Channel

Architecture: Client → WebSocket → Chat Gateway → Kafka → Chat Processor (Flink) → Fan-out.

Chat Gateway (WebSocket servers): Each server handles 100K WebSocket connections. 50M viewers → 500 WS servers. Viewer sends message → WS server validates → publishes to Kafka topic chat-messages keyed by channel_id.

Chat Processor (Flink) handles: rate limiting (max 1 msg per 1.5s per user), spam filter, banned words, subscriber checks, and emote parsing.

Fan-out: The Scaling Challenge: 200K viewers × 50K msgs/sec = 10B pushes/sec (impossible). Solutions:

Message batching: Batch messages per 100ms window → 1 batch per 100ms.
Message sampling: Show only 20 msgs/sec to each viewer (randomly sampled); highlighted messages always shown. Result: 200K × 20 = 4M pushes/sec (manageable).
Redis Pub/Sub: Each WS server subscribes to Redis Pub/Sub for channels its clients watch. 1 Redis publish → 500 server deliveries → each delivers to ~400 local clients.
Sharding hot channels: Top 10 channels get dedicated chat infrastructure.

VOD: Automatic Stream Archive

During live stream: HLS segments are copied to S3 "vod-archive" bucket. After stream ends: generate complete HLS manifest, DASH manifest, thumbnail, optionally split into chapters, run content moderation, and make available for on-demand playback.

Storage: 800 TB/day. Retention: 60 days (Twitch standard). Cost optimization: first 7 days on S3 Standard, 7-60 days on S3 Infrequent Access (50% cheaper).

Clips: Viewer clicks "Clip" → capture last 60 seconds from HLS segments → concatenate into MP4 → transcode to clip format → store permanently.

Event Bus Design (Kafka)

Topic: live_streaming_platform-events
  Partitions: 64 (scale consumers horizontally)
  Partition key: entity_id (user_id / order_id — preserves per-entity ordering)
  Retention: 7 days (compliance) or 24h (high-volume telemetry)
  Replication factor: 3, min.insync.replicas: 2

Producer: idempotent producer enabled (enable.idempotence=true)
Consumer: consumer group "live_streaming_platform-processors"
  - At-least-once delivery + idempotent handlers (dedup by event_id)
  - DLQ topic: live_streaming_platform-events-dlq (poison messages after 3 retries)
  - Lag alert: consumer lag > 60s → scale workers

Design a Live Streaming Platform like Twitch: async side effects MUST NOT block the synchronous API response.
  Sync path: validate → persist source of truth → publish event → return 201
  Async path: consumers update caches, indexes, notifications, aggregates

Start Stream

HTTP

POST /api/v1/streams/start
{
  "channel_id": "ch-uuid",
  "title": "Friday Night Gaming",
  "category": "Fortnite",
  "tags": ["English", "Competitive"],
  "language": "en"
}
Response: 200 OK
{
  "stream_id": "stream-uuid",
  "ingest_url": "rtmp://ingest-us-east.example.com/live",
  "stream_key": "live_sk_abc123def456",
  "recommended_settings": {
    "resolution": "1920x1080",
    "fps": 60,
    "bitrate": "6000 kbps",
    "keyframe_interval": 2,
    "encoder": "x264",
    "rate_control": "CBR"
  }
}

Get Live Stream (Viewer)

HTTP

GET /api/v1/streams/{channel_id}/live
Response: 200 OK
{
  "stream_id": "stream-uuid",
  "channel": {"name": "Ninja", "avatar": "..."},
  "title": "Friday Night Gaming",
  "viewer_count": 145230,
  "started_at": "2025-03-14T20:00:00Z",
  "manifest_url": "https://cdn.example.com/live/stream-uuid/master.m3u8",
  "chat_websocket": "wss://chat.example.com/ws/ch-uuid"
}

Send Chat Message

JSON

// WebSocket message
{
  "type": "chat_message",
  "channel_id": "ch-uuid",
  "content": "PogChamp that was insane!"
}

// Server broadcast
{
  "type": "chat_message",
  "id": "msg-uuid",
  "user": {"name": "viewer123", "color": "#FF0000"},
  "content": "PogChamp that was insane!",
  "timestamp": "2025-03-14T20:15:23Z"
}

Create Clip

HTTP

POST /api/v1/clips
{
  "stream_id": "stream-uuid",
  "title": "Insane play!",
  "duration_seconds": 30,
  "offset_seconds": -30
}
Response: 201 Created
{
  "clip_id": "clip-uuid",
  "url": "https://cdn.example.com/clips/clip-uuid.mp4",
  "status": "processing"
}

Browse Streams

HTTP

GET /api/v1/streams?category=gaming&sort=viewers_desc&limit=20

Common Error Responses

400 Bad Request: invalid input, missing fields, or malformed JSON
401 Unauthorized: missing or invalid auth token or API key
403 Forbidden: authenticated but insufficient permissions
404 Not Found: resource ID does not exist
409 Conflict: duplicate write or version conflict; retry with idempotency key
422 Unprocessable Entity: valid syntax but invalid business logic
429 Too Many Requests: rate limit exceeded; honor Retry-After header
500 Internal Error: unexpected server fault; retry with idempotency key
503 Service Unavailable: dependency down or overloaded; use exponential backoff
440 Login Timeout: WebSocket session expired; reconnect required
202 Accepted: job queued; poll GET /jobs/{id} for status
408 Request Timeout: job still processing; continue polling

MySQL: Stream & Channel Metadata

SQL

CREATE TABLE channels (
    channel_id      UUID PRIMARY KEY,
    user_id         UUID NOT NULL UNIQUE,
    name            VARCHAR(50) UNIQUE NOT NULL,
    display_name    VARCHAR(50),
    description     TEXT,
    avatar_url      TEXT,
    banner_url      TEXT,
    stream_key      VARCHAR(64) NOT NULL,
    follower_count  INT DEFAULT 0,
    subscriber_count INT DEFAULT 0,
    partner         BOOLEAN DEFAULT FALSE,
    created_at      TIMESTAMP
);

CREATE TABLE streams (
    stream_id       UUID PRIMARY KEY,
    channel_id      UUID NOT NULL,
    title           VARCHAR(255),
    category_id     INT,
    language        CHAR(2),
    status          ENUM('live', 'ended') DEFAULT 'live',
    viewer_count    INT DEFAULT 0,
    peak_viewers    INT DEFAULT 0,
    started_at      TIMESTAMP,
    ended_at        TIMESTAMP,
    vod_url         TEXT,
    duration_seconds INT,
    INDEX idx_channel (channel_id, started_at DESC),
    INDEX idx_category_live (category_id, status, viewer_count DESC),
    INDEX idx_live_viewers (status, viewer_count DESC)
);

Redis: Live State

stream:{channel_id}    → Hash { stream_id, title, category, started_at, ingest_server, manifest_url, status }
viewers:{stream_id}    → INT (INCR/DECR on connect/disconnect)
stream_key:{key}       → channel_id
chat_rate:{user_id}:{channel_id}  → INT (INCR, check < 1 per 1.5 sec)
chat_banned:{channel_id}  → SET of user_ids
live_streams:{category}  → Sorted Set { channel_id: viewer_count }

Kafka Topics

Topic: stream-events         (stream started, ended, title changed)
Topic: chat-messages          (partitioned by channel_id for ordering)
Topic: viewer-events          (join, leave — for viewer count)
Topic: clip-requests          (async clip generation)
Topic: moderation-events      (ban, timeout, message delete)

S3: Video Storage

Bucket: live-segments (short retention, 48 hours)
  /{stream_id}/720p/segment_000001.ts
  /{stream_id}/1080p/segment_000001.ts
  /{stream_id}/master.m3u8

Bucket: vod-archive (60-day retention)
  /{stream_id}/vod/master.m3u8
  /{stream_id}/vod/720p/segment_000001.ts

Bucket: clips (permanent)
  /{clip_id}/clip.mp4

Concern	Solution
Ingest server crash	Streamer's software auto-reconnects to backup ingest server; 2-5 sec gap
Transcoder crash	Hot standby transcoder per stream; failover in < 1 second
CDN edge failure	CDN auto-routes to next nearest PoP; viewer sees brief rebuffer
Origin storage failure	S3 11 nines durability; origin shield with multiple backends
Chat server crash	WS reconnect → new server; message history replayed from Kafka
Viewer count drift	Periodic reconciliation: count actual WS connections per stream
Stream key leaked	Streamer can regenerate stream key instantly; invalidate old key

Handling a Stream That Goes Viral (1K → 500K Viewers in Minutes)

CDN cache: Before viral, segments cached at 2-3 PoPs. After viral, viewers from 100+ PoPs. Solution: origin shield (absorbs 90% edge misses), CDN pre-push to all PoPs when viewer count > 50K, and multi-origin replication.

Transcoding: Migrate to dedicated transcoder (seamless, < 1s gap). Add more quality options for ABR.

Chat: Migrate channel to dedicated chat cluster, enable message sampling, add slow mode (1 msg per 5s per user).

Stream Latency Optimization Budget

End-to-end: ~2.3 seconds total. Each step optimized: encoder buffer (500ms), ingest PoP (200ms), GPU transcode (200ms), LL-HLS partial segment (200ms), CDN edge (500ms), player buffer (500ms), decode+render (100ms). Trade-off: Lower latency = smaller buffer = more rebuffering. Let viewer choose.

HLS vs DASH vs WebRTC for Live Delivery

Protocol	Pros	Cons
HLS	Universal (iOS, Android, all browsers), works with CDN, LL-HLS brings latency to 2-5s	Higher latency than WebRTC
DASH	Open standard (no Apple dependency), LL-DASH similar to LL-HLS	iOS Safari requires CMAF
WebRTC	Ultra-low latency (< 500ms)	P2P doesn't scale to millions, no CDN support, no DRM

Industry moving to CMAF (unified format for HLS and DASH). Twitch uses a custom CMAF + chunked transfer encoding protocol achieving ~2-3 seconds end-to-end.

Viewer Count: Why It's Harder Than You Think

Approach 1: Central counter (Redis INCR/DECR): 50M viewers connecting/disconnecting causes millions of Redis ops/sec. WebSocket crashes inflate count (no DECR). Not recommended.

Approach 2: Periodic sweep ⭐ (Twitch's approach): Each WS server reports per-stream count every 10 seconds. Aggregator sums reports. Self-correcting (if server crashes, it stops reporting). 10-second staleness is acceptable ("145K viewers" doesn't need to be exact).

Approach 3: HyperLogLog: For unique viewer count (not concurrent). 12 KB per stream regardless of viewer count, ±0.81% error.

Cost Analysis: Bandwidth Is Everything

50M concurrent viewers × 4 Mbps avg = 200 Tbps egress. 200 Tbps ÷ 8 = 25 TB/s × 3600 = 90M GB/hour. At $0.03/GB CDN cost: ~$2.7M per hour. This is why Twitch operates at a loss on bandwidth alone.

Cost optimization: ABR pushes lower qualities (default to 720p), multi-CDN with volume discounts, own CDN infrastructure (Twitch/Amazon saves 50-70%), P2P CDN (experimental), and codec efficiency (HEVC/AV1 saves 30-50% bandwidth).

Transcoding: Dedicated Per-Stream vs Shared Pool

Dedicated: Isolated, predictable, but expensive and under-utilized.

Shared pool: Cost-efficient (6-8 streams per GPU), but noisy neighbor problem.

Hybrid ⭐ (Twitch): Tier 1 partners (> 1000 avg viewers) get dedicated. Tier 2 affiliates get shared pool (4 streams/instance). Tier 3 (< 10 viewers): source quality only (no transcoding).

SLOs & Error Budgets

Metric	Target	Rationale
Core user-facing availability	99.95%	Budget for planned maintenance + unplanned failures without user-visible outage.
p99 latency (critical path)	Problem-specific — state target early and tie to capacity math	Interview credibility comes from connecting SLO to architecture choices.
Error rate (5xx)	< 0.1%	Distinguishes transient blips from systemic failure requiring rollback.
Data durability	99.999999999% (11 nines) for committed writes	Define which operations require fsync/quorum vs async replication.

Incident Scenarios (2am reality)

Scenario	How you detect	Mitigation
Primary database unavailable	Health check failures, connection pool exhaustion alerts, elevated 5xx	Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists
Traffic spike (10× normal)	RPS anomaly alert, autoscaling lag, latency SLO burn rate	Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations
Bad deploy causing elevated errors	Canary metric regression, error budget burn, deployment correlation	Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility

Cost Drivers (Staff lens)

Egress bandwidth and CDN (often dominates media/data-heavy systems)
Database storage + IOPS at scale (plan compaction, TTL, tiering)
Compute for async pipelines (right-size workers, spot instances for batch)
Managed service premiums vs operational headcount trade-off

Multi-Region & DR

Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.

Interview Prompt

Clarifying Questions (ask before designing)

Scope

In scope

Out of scope (state explicitly)

Assumptions

Ingest: Receiving the Live Stream

RTMP vs SRT vs WebRTC for Ingest

Real-Time Transcoding: The Hardest Part

Pipeline per stream

HLS Segment Generation

Low-Latency HLS (LL-HLS): Getting Below 3 Seconds

Chat System: 50K Messages/Second Per Channel

VOD: Automatic Stream Archive

Event Bus Design (Kafka)

Start Stream

Get Live Stream (Viewer)

Send Chat Message

Create Clip

Browse Streams

Common Error Responses

MySQL: Stream & Channel Metadata

Redis: Live State

Kafka Topics

S3: Video Storage

Handling a Stream That Goes Viral (1K → 500K Viewers in Minutes)

Stream Latency Optimization Budget

Interview Walkthrough

HLS vs DASH vs WebRTC for Live Delivery

Viewer Count: Why It's Harder Than You Think

Cost Analysis: Bandwidth Is Everything

Transcoding: Dedicated Per-Stream vs Shared Pool

Phase 1: MVP (0 to 100K users)

Phase 2: Growth (100K to 10M users)

Phase 3: Scale (10M+ users)

SLOs & Error Budgets

Incident Scenarios (2am reality)

Cost Drivers (Staff lens)

Multi-Region & DR