Interview Prompt
Design Podcast Delivery Platform.
Clarifying Questions (ask before designing)
| Question | Why it matters |
|---|---|
| Which of these is highest priority: RSS ingestion, Audio processing, CDN distribution? | Forces scope negotiation — senior candidates trim before drawing boxes. |
| What scale should we design for — DAU, QPS, data volume? | Drives every capacity decision; shows structured thinking. |
| What are the read vs write patterns on the critical path? | Determines caching, DB choice, and replication topology. |
| What consistency and durability guarantees are required? | Separates strong-consistency paths from eventual ones — a senior differentiator. |
Scope
In scope
- RSS ingestion
- Audio processing
- CDN distribution
- Subscription management
- Download tracking
- Capacity estimation with shown math
Out of scope (state explicitly)
- Recommendation / home feed ranking (#48, #65)
- Live chat and comments (#36)
- DRM license server internals
Assumptions
- Clarify scale (DAU, QPS, data volume) for podcast delivery platform in the first 5 minutes
- Standard reliability target 99.9%–99.99% unless problem implies higher (payments, booking)
- Managed cloud services (RDS, S3, Kafka, Redis) are acceptable building blocks
These foundational concepts underpin the patterns used in this problem. Review them before deep-diving into component-level trade-offs.
- Upload podcasts: Creators upload audio episodes with metadata (title, description, show notes, chapters)
- Streaming playback: Stream episodes with seeking, speed control (0.5×–3×), skip silence
- Downloading: Offline download for listening without internet
- RSS feed: Generate and serve RSS/Atom feeds for distribution to Apple Podcasts, Spotify, etc.
- Show management: Create shows (series), manage episodes, schedule future releases
- Discovery: Browse by category, charts (top podcasts), search, recommendations
- Subscriptions: Users subscribe to shows; new episodes appear in their feed
- Playback state: Sync playback position across devices (resume where you left off)
- Analytics: Download counts, listener demographics, retention graphs per episode
- Monetization: Dynamic ad insertion (pre-roll, mid-roll, post-roll), premium subscriptions
- Playback Start: Audio begins playing within 2 seconds globally
- Availability: 99.99%: listeners can always play episodes
- Durability: Audio files never lost
- Scalability: 5M+ podcast shows, 100M+ episodes, 100M+ active listeners/month
- Global: Low-latency playback worldwide via CDN
- Bandwidth Efficiency: Adaptive bitrate; compressed audio formats (Opus, AAC)
- Analytics Accuracy: Download/listen counts accurate within ±1%
- RSS Compliance: Valid RSS 2.0 with iTunes podcast extensions
| Metric | Calculation | Value |
|---|---|---|
| Total shows | Given (assumption documented in value) | 5M |
| Total episodes | Given (assumption documented in value) | 100M |
| New episodes / day | Given (assumption documented in value) | 100K |
| Avg episode duration | Given (typical workload assumption) | 45 minutes |
| Avg episode size | Given (typical workload assumption) | 50 MB (128 kbps MP3) |
| Upload storage / day | 100K × 50 MB | 5 TB |
| Total storage | Given (assumption documented in value) | 5 PB |
| Daily active listeners | Given (assumption documented in value) | 30M |
| Concurrent streams | Given (peak load assumption) | 5M |
| Stream bandwidth | 5M × 128 kbps | 640 Gbps |
| Downloads / day | Given (assumption documented in value) | 500M (including RSS aggregators) |
| Download bandwidth | 500M × 50 MB | 25 PB / day |
The system leverages CloudFront CDN for high-availability RSS feed delivery and globally cached audio, utilizing PostgreSQL for primary transactional tables, Redis for volatile playback tracking and caching, and S3 for processing files and artwork assets.
1. Audio Processing Pipeline
Creators upload raw audio (WAV, FLAC, MP3). The pipeline normalizes, trims, transcodes, generates waveforms, chapters, and optional transcripts.
Step 1: Validate + Probe (FFprobe: extract duration, format, code, reject if >12 hrs or >2GB)
Step 2: Normalize Audio
- Normalize loudness: target -16 LUFS (loudness standard for podcasts)
ffmpeg -i input.wav -af "loudnorm=I=-16:TP=-1.5:LRA=11" normalized.wav
- Silence trimming: remove > 3 seconds of silence at start/end
Step 3: Transcode to Multiple Formats/Bitrates
- MP3 128 kbps: Universal compatibility (RSS feed reference)
- AAC 128 kbps: iOS/Android native high quality
- Opus 48 kbps: Incredible compression for modern clients (50% smaller than MP3!)
Step 4: Chapter Markers (Embed in ID3/M4A tags: { title, start_time, end_time })
Step 5: Generate Waveform (RMS amplitude per 100ms for custom seekbars, ~50KB JSON)
Step 6: Speech-to-Text Transcription (Whisper API, cost-optimized: only run for shows with >100 subs)2. RSS Feed Service: The Core Distribution Mechanism
Podcasts are distributed via RSS. Every external aggregator (Spotify, Apple, Overcast) polls RSS feeds constantly.
Feed serving at scale:
5M shows × polled every 15-30 minutes by 10+ aggregators = ~30M feed requests/hour
Strategy:
1. Pre-generate RSS XML for each show → store in S3
2. Serve via CDN with 15-minute TTL
3. On new episode publish: regenerate feed XML → invalidate CDN cache
4. Conditional requests: ETag/If-Modified-Since → 304 Not Modified (saves 90% bandwidth)
Stable RSS URL format: https://feeds.example.com/shows/{show_id}/rss3. Playback Sync: Resume Across Devices
Syncs playback position dynamically, allowing a seamless transition from phone commute to desktop browser.
Sync mechanism:
Client reports position every 30 seconds:
POST /api/v1/playback/progress { episode_id, position_seconds, speed }
On opening:
GET /api/v1/playback/progress/{episode_id} → resumes from stored point
Storage: Redis
Key: playback:{user_id}:{episode_id}
Value: Hash { position, speed, duration, updated_at }
TTL: 90 days (auto-cleanup old progress)
Scale: 30M DAU × update every 30 sec = ~100K writes/sec (easily handled by 10 Redis Cluster shards)4. Dynamic Ad Insertion (DAI): The Revenue Engine
Rather than "baked-in" static ads, Server-Side Ad Insertion (SSAI) stitches targeted ads into streams at request time.
SSAI Splicing Flow:
1. Creator marks ad breaks: { "breaks": [{"position": 0, "type": "pre-roll"}, {"position": 1200, "type": "mid-roll"}] }
2. On request, Ad Decision Service evaluates demographics, frequency caps, and targets ads.
3. Audio Stitching Service:
- Splicing HLS segments dynamically on edge.
- Segmented playlist: [segment_pre, ad_1, segment_mid, ad_2]
- Allows pre-encoded segments cached on CDN separately. No real-time heavy CPU re-encoding.
Ad Impression Tracking:
Client fires event when passing ad bounds: { ad_id, event: "impression|start|50%|complete" } → Kafka → ClickHouseEvent Bus Design (Kafka)
Topic: podcast_delivery_platform-events Partitions: 64 (scale consumers horizontally) Partition key: entity_id (user_id / order_id — preserves per-entity ordering) Retention: 7 days (compliance) or 24h (high-volume telemetry) Replication factor: 3, min.insync.replicas: 2 Producer: idempotent producer enabled (enable.idempotence=true) Consumer: consumer group "podcast_delivery_platform-processors" - At-least-once delivery + idempotent handlers (dedup by event_id) - DLQ topic: podcast_delivery_platform-events-dlq (poison messages after 3 retries) - Lag alert: consumer lag > 60s → scale workers Design a Podcast Delivery Platform: async side effects MUST NOT block the synchronous API response. Sync path: validate → persist source of truth → publish event → return 201 Async path: consumers update caches, indexes, notifications, aggregates
Upload Episode
POST /api/v1/shows/{show_id}/episodes
{
"title": "Episode 42: System Design",
"description": "In this episode...",
"audio_file_key": "uploads/ep-42-raw.wav",
"publish_at": "2025-03-14T08:00:00Z",
"season": 3,
"episode_number": 42,
"explicit": false,
"chapters": [
{"title": "Introduction", "start": 0},
{"title": "Main Topic", "start": 180},
{"title": "Interview", "start": 1200}
],
"ad_breaks": [
{"position": 0, "type": "pre_roll", "max_duration": 30},
{"position": 1200, "type": "mid_roll", "max_duration": 60}
]
}Stream Episode
GET /api/v1/episodes/{episode_id}/stream?format=aac&quality=128k
Response: 302 Redirect
Location: https://cdn.example.com/audio/ep-uuid/aac_128k.m4a
Or with ad insertion:
Location: https://cdn.example.com/dai/ep-uuid/playlist.m3u8Get User's Subscription Feed
GET /api/v1/feed?limit=20&cursor={last}
Response: 200 OK
{
"episodes": [
{
"episode_id": "ep-uuid",
"show": {"id": "show-uuid", "title": "Tech Talk", "art": "..."},
"title": "Episode 42: System Design",
"duration_seconds": 2700,
"published_at": "2025-03-14T08:00:00Z",
"progress": {"position": 1847, "percent": 68},
"stream_url": "https://cdn.example.com/audio/ep-uuid/aac_128k.m4a",
"download_url": "https://cdn.example.com/audio/ep-uuid/mp3_128k.mp3",
"file_size_bytes": 32400000
}
]
}Update Playback Progress
POST /api/v1/playback/progress
{
"episode_id": "ep-uuid",
"position_seconds": 1847,
"speed": 1.5,
"duration_seconds": 2700
}Get RSS XML (Aggregator facing)
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:podcast="https://podcastindex.org/namespace/1.0">
<channel>
<title>My Podcast Show</title>
<link>https://example.com/shows/my-podcast</link>
<itunes:author>John Doe</itunes:author>
<itunes:category text="Technology"/>
<itunes:image href="https://cdn.example.com/art/show-123.jpg"/>
<item>
<title>Episode 42: System Design</title>
<enclosure url="https://cdn.example.com/audio/ep-42.mp3" length="57000000" type="audio/mpeg"/>
<pubDate>Fri, 14 Mar 2025 08:00:00 GMT</pubDate>
<itunes:duration>3600</itunes:duration>
<description>In this episode we discuss...</description>
<podcast:chapters url="https://cdn.example.com/chapters/ep-42.json"/>
</item>
</channel>
</rss>Common Error Responses
400 Bad Request: invalid input, missing fields, or malformed JSON
401 Unauthorized: missing or invalid auth token or API key
403 Forbidden: authenticated but insufficient permissions
404 Not Found: resource ID does not exist
409 Conflict: duplicate write or version conflict; retry with idempotency key
422 Unprocessable Entity: valid syntax but invalid business logic
429 Too Many Requests: rate limit exceeded; honor Retry-After header
500 Internal Error: unexpected server fault; retry with idempotency key
503 Service Unavailable: dependency down or overloaded; use exponential backoff
202 Accepted: job queued; poll GET /jobs/{id} for status
408 Request Timeout: job still processing; continue pollingPostgreSQL: Core Relational Data
CREATE TABLE shows (
show_id UUID PRIMARY KEY,
creator_id UUID NOT NULL,
title VARCHAR(255) NOT NULL,
description TEXT,
category VARCHAR(50),
subcategory VARCHAR(50),
language CHAR(5),
artwork_url TEXT,
website_url TEXT,
rss_feed_url TEXT NOT NULL, -- public feed URL (stable, permanent)
explicit BOOLEAN DEFAULT FALSE,
subscriber_count INT DEFAULT 0,
total_episodes INT DEFAULT 0,
status ENUM('active', 'paused', 'archived') DEFAULT 'active',
created_at TIMESTAMP,
updated_at TIMESTAMP,
INDEX idx_category (category, subscriber_count DESC),
INDEX idx_creator (creator_id)
);
CREATE TABLE episodes (
episode_id UUID PRIMARY KEY,
show_id UUID NOT NULL,
title VARCHAR(255) NOT NULL,
description TEXT,
show_notes TEXT,
season SMALLINT,
episode_number SMALLINT,
duration_seconds INT,
audio_url_mp3 TEXT, -- CDN URL for MP3
audio_url_aac TEXT, -- CDN URL for AAC
audio_url_opus TEXT, -- CDN URL for Opus
original_s3_key TEXT,
file_size_bytes INT,
chapters JSONB,
ad_breaks JSONB,
transcript_url TEXT,
waveform_url TEXT,
explicit BOOLEAN DEFAULT FALSE,
status ENUM('draft','processing','scheduled','published','archived'),
published_at TIMESTAMPTZ,
created_at TIMESTAMP,
INDEX idx_show (show_id, published_at DESC),
INDEX idx_published (status, published_at DESC)
);
CREATE TABLE subscriptions (
user_id UUID NOT NULL,
show_id UUID NOT NULL,
subscribed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
notifications BOOLEAN DEFAULT TRUE,
PRIMARY KEY (user_id, show_id),
INDEX idx_show (show_id) -- "who subscribes to this show"
);Redis Key Schemas
# Playback progress
playback:{user_id}:{episode_id} → Hash { position, speed, updated_at } (TTL: 90 days)
# User's episode queue
queue:{user_id} → List of episode_ids (ordered)
# RSS feed cache
rss:{show_id} → String (RSS XML blob) (TTL: 15 minutes)
# Podcast charts (Sorted Sets)
charts:top:{category} → Sorted Set { show_id: score }
charts:trending → Sorted Set { show_id: growth_score }
# Episode download counter (incremented in Redis, flushed daily to ClickHouse)
downloads:{episode_id}:{date} → INT (INCR) (TTL: 2 days)S3 Storage Layout
Bucket: podcast-originals (cross-region replicated, permanent)
/{show_id}/{episode_id}/original.wav
Bucket: podcast-processed (CDN-served)
/{show_id}/{episode_id}/mp3_128k.mp3
/{show_id}/{episode_id}/aac_128k.m4a
/{show_id}/{episode_id}/opus_48k.ogg
/{show_id}/{episode_id}/waveform.json
/{show_id}/{episode_id}/transcript.json
Bucket: podcast-artwork
/{show_id}/artwork_3000x3000.jpg
/{show_id}/artwork_600x600.jpgKafka Message Bus Topics
Topic: episode-published (triggers RSS regeneration + push notifications) Topic: playback-events (play, seek, complete — feeds ClickHouse analytics) Topic: download-events (download started — used for IAB compliance filters) Topic: ad-events (ad impressions — monetization statistics)
ClickHouse: Analytics DB
CREATE TABLE episode_plays (
episode_id UUID,
show_id UUID,
user_id UUID,
event_type Enum8('play'=0,'pause'=1,'seek'=2,'complete'=3,'download'=4),
position_seconds UInt32,
duration_seconds UInt32,
speed Float32,
platform Enum8('ios'=0,'android'=1,'web'=2,'rss'=3),
country FixedString(2),
city String,
event_date Date MATERIALIZED toDate(timestamp),
timestamp DateTime
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (show_id, episode_id, timestamp);| Concern | Solution |
|---|---|
| Audio file corruption | Checksum verification after upload; re-upload from creator if corrupt |
| CDN failure | Multi-CDN (CloudFront + Akamai); DNS failover in < 30 seconds |
| RSS feed stale | Max TTL 15 min; manual cache purge on publish; ETag for conditional requests |
| Playback sync loss | Client buffers progress locally → retry sync when online |
| Ad insertion failure | Serve episode WITHOUT ads (degrade gracefully; better than no audio) |
| Processing pipeline failure | Retry 3× from Kafka; DLQ for persistent failures; alert creator |
| Download counter loss | Redis AOF + batch flush to ClickHouse every hour; ClickHouse is source of truth |
Specific: RSS Polling Storm (Thundering Herd)
Aggregators sync feeds simultaneously on the hour, triggering a massive thundering herd request spike of 55K req/sec.
- CDN Edge Caching: Feeds are cached globally on CDN edges with a 15-minute TTL. Only 5% of requests hit the origin.
- Conditional Requests: Aggregators support ETag and If-None-Match headers. 90% of requests return
304 Not Modified, saving massive bandwidth. - WebSub PubSubHubbub Push: Pushes new episode announcements to aggregators in real-time webhook endpoints instead of regular polling, eliminating 99% of requests.
Specific: Download Counting Accuracy (IAB Standard)
Advertisers pay per 1000 downloads (CPM), making overcounting (fraud) or undercounting (lost revenue) highly sensitive issues.
IAB Podcast Measurement Guidelines:
1. Deduplication: In Flink stream, generate key = SHA256(ip + user_agent + episode_id).
Window of 24 hours. Ignore matches within this window.
2. Bot filtering: Filter out automated crawlers matching the IAB bot list.
3. Byte-range filtering:
- Ignore byte 0-1000 requests (metadata fetching only).
- Ignore downloads where total bytes served < 50% of episode size.Interview Walkthrough
- Lead with delivery model: RSS feeds for aggregator compatibility plus direct streaming for native apps — two paths, one origin.
- Explain CDN edge caching with 15-minute TTL and ETag/If-None-Match so 90% of aggregator polls return 304 without hitting origin.
- Cover WebSub push to eliminate the hourly RSS polling thundering herd that spikes origin to 55K req/sec.
- Walk through multi-codec storage: MP3 in RSS for universal support, Opus/AAC for in-app playback to save 50% bandwidth.
- Describe IAB-compliant download counting: dedup window, bot filtering, and byte-range thresholds in a Flink stream.
- Mention silence-skip as client-side precomputed bounds JSON — avoids re-encoding and preserves chapter markers.
- Common pitfall: counting every byte-range request as a download — metadata fetches inflate CPM numbers and anger advertisers.
1. Audio Codec Choice: MP3 vs AAC vs Opus
- MP3 128k: Universal compatibility across older car players and web platforms. Mandatory for RSS enclosure feeds. Poorer compression than modern codecs.
- AAC 128k: Native support on modern mobile platforms, 30% more efficient than MP3.
- Opus 48k: Industry-leading speech compression. 48kbps Opus equals the quality of 128kbps MP3 while saving 50%+ on CDN bandwidth costs. Not yet accepted by RSS aggregators.
Strategy: Store all three. Reference MP3 inside the public RSS XML. In native mobile players, negotiate standard player support: Opus → AAC → MP3 fallback.
2. Silence Detection and Skip
Automatically skipping silent gaps to keep podcasts fast and engaging.
Detection: Analyze RMS amplitude in 50ms windows. If RMS < threshold for > 500ms, mark as silence. Adaptive threshold: Measure noise floor from first 2 seconds. Set threshold = noise_floor + 6 dB. 1. Client-Side Skip (Chosen): - Precompute silence bounds JSON: [(start1, end1), (start2, end2)]. - Client player seeks past bounds. - Extremely flexible, zero extra storage or re-encoding costs. 2. Server-Side Skip: - Trim silences and save a condensed MP3. - Saves CDN bandwidth, but breaks chapter markers and dynamic ad boundaries.
3. Podcast Discovery: Charts and Recommendations
Charts Ranking Formula:
score = w1 × new_subscribers_7d + w2 × downloads_7d + w3 × listener_retention + w4 × growth_velocity
- High velocity weighting allows rising trending podcasts to break into charts.
- Computed daily via Spark and cached in Redis Sorted Sets: charts:top:{category}.
Recommendations:
- Collaborative Filtering: User-show subscription matrices factored for matching shows.
- Content-based: Generate semantic embeddings from title, description, and STT transcripts.
- Serve recommendations via cosine similarity rankings, cached in Redis.Staff interviews expect you to articulate how the system evolves under real growth — not jump straight to the final architecture.
Phase 1: MVP (0 to 100K users)
Monolith or minimal services proving core podcast delivery platform flows. Optimize for shipping speed and correctness over scale.
Key components: Single region · Primary DB + Redis cache · Synchronous core path · Basic monitoring
Move to next phase when: p99 latency exceeds SLO or DB CPU sustained above 70%
Phase 2: Growth (100K to 10M users)
Split read/write paths, introduce async processing for non-critical work, add caching layers and horizontal scaling.
Key components: Read replicas or CQRS · Message queue for async work · CDN / edge caching · Service-level SLOs
Move to next phase when: Hot keys, fan-out bottlenecks, or ops toil from manual scaling
Phase 3: Scale (10M+ users)
Shard data plane, multi-region active-active or active-passive, formal DR runbooks, cost optimization.
Key components: Database sharding / partitioning · Multi-region replication · Auto-scaling + chaos testing · Dedicated platform/SRE ownership
Move to next phase when: Regional failure domain risk, compliance data residency, or linear cost growth unsustainable
SLOs & Error Budgets
| Metric | Target | Rationale |
|---|---|---|
| Core user-facing availability | 99.95% | Budget for planned maintenance + unplanned failures without user-visible outage. |
| p99 latency (critical path) | Problem-specific — state target early and tie to capacity math | Interview credibility comes from connecting SLO to architecture choices. |
| Error rate (5xx) | < 0.1% | Distinguishes transient blips from systemic failure requiring rollback. |
| Data durability | 99.999999999% (11 nines) for committed writes | Define which operations require fsync/quorum vs async replication. |
Incident Scenarios (2am reality)
| Scenario | How you detect | Mitigation |
|---|---|---|
| Primary database unavailable | Health check failures, connection pool exhaustion alerts, elevated 5xx | Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists |
| Traffic spike (10× normal) | RPS anomaly alert, autoscaling lag, latency SLO burn rate | Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations |
| Bad deploy causing elevated errors | Canary metric regression, error budget burn, deployment correlation | Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility |
Cost Drivers (Staff lens)
- Egress bandwidth and CDN (often dominates media/data-heavy systems)
- Database storage + IOPS at scale (plan compaction, TTL, tiering)
- Compute for async pipelines (right-size workers, spot instances for batch)
- Managed service premiums vs operational headcount trade-off
Multi-Region & DR
Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.