Design a Content Delivery Network (CDN) – System Design Walkthrough

This problem appears in multiple sheets. Depth expectations increase as you progress:

Track	What to demonstrate
Arch 25	Pure infrastructure problem. Nail PoP hierarchy, cache hit ratio math, invalidation strategies, origin shield with consistent hashing, and geo-routing latency trade-offs.
Arch 50	Add dynamic content caching (edge compute), TLS termination at edge, and cache poisoning / request smuggling defenses.
Arch 75	Staff: design a CDN control plane (tenant isolation, config propagation to 200+ PoPs), and cost model for building vs buying at 10 Tbps peak.

Interview Prompt

Design a Content Delivery Network (CDN) that caches static and cacheable dynamic content at edge locations worldwide. Origin servers sit in one or few regions; users worldwide should get low-latency responses with high cache hit ratio.

Clarifying Questions (ask before designing)

Question	Why it matters
Static only, or dynamic/cacheable API responses too?	Static (images, JS, video segments) = long TTL, immutable URLs. Dynamic = short TTL or edge-side includes / cache keyed by Vary headers.
What's peak bandwidth and request rate?	10 Tbps peak drives PoP count, edge server sizing, and origin shield necessity. 1M RPS drives connection handling architecture.
Single tenant or multi-tenant CDN-as-a-service?	Multi-tenant needs config isolation, per-tenant cache namespaces, and fair queuing on origin fetch.
Invalidation latency requirement?	Immediate purge (< 30 sec globally) vs eventual (TTL expiry) — fundamentally different control plane design.

Scope

In scope

PoP architecture and cache hierarchy
Geo-routing and DNS/load balancing to nearest edge
Origin shield and consistent hashing
Cache invalidation (purge API, TTL, surrogate keys)
Hit ratio optimization and capacity math

Out of scope (state explicitly)

Full DNS provider design (assume GeoDNS exists)
DDoS mitigation internals (WAF/scrubbing center as black box)
Video-specific ABR (see #15)
Origin application design

Assumptions

200+ PoPs globally, 10 Tbps peak aggregate egress
Origin in us-east-1; 95% target cache hit ratio
Mixed content: 70% static assets, 30% cacheable API
Multi-tenant SaaS CDN serving 10K customers

Cache and serve static content (images, videos, CSS, JS, fonts) from edge servers closest to the user
Route user requests to the optimal edge server (lowest latency)
If content not cached at edge, fetch from origin server ("cache miss")
Support cache invalidation / purge on demand
Support SSL/TLS termination at the edge
Provide analytics: bandwidth usage, cache hit ratio, latency by region
Support custom cache rules (TTL, cache key customization, query string handling)

Metric	Calculation	Value
Total PoPs	Given (assumption documented in value)	300
Servers per PoP	Given (assumption documented in value)	50-500 (varies by PoP size)
Total edge servers	Given (assumption documented in value)	50,000
Peak bandwidth	Given (peak load assumption)	200 Tbps
Requests / sec	From Requests / day ÷ 86400 (+ peak factor in value)	50M (globally)
Cache storage per PoP	Given (assumption documented in value)	100 TB
Total cached content	300 × 100 TB	30 PB
Origin requests (10% miss rate)	Given (assumption documented in value)	5M/sec

Loading...

DNS-Based Routing (GeoDNS)

User's DNS resolver sends query for cdn.example.com
CDN's authoritative DNS server looks up the resolver's IP location
Returns the IP address of the closest/fastest PoP
Pros: Simple, widely supported
Cons: DNS TTL caching means slow failover; resolver IP ≠ user IP

Anycast Routing (Alternative/Complementary)

Multiple PoPs announce the same IP address via BGP
Network routing naturally sends packets to the closest PoP
Pros: Instant failover, immune to DNS TTL issues
Cons: No per-user control, relies on ISP routing tables
Real-world: Cloudflare uses Anycast; AWS CloudFront uses GeoDNS

Edge Server (Cache Node)

Reverse proxy (NGINX / custom): Handles TLS, request routing, caching
Cache storage: Memory (RAM) for hot content (64 GB), SSD for warm content (2-10 TB), HDD for cold content (50+ TB)
Cache lookup: Hash(cache_key) → check memory → check SSD → check HDD → cache miss
LRU eviction: Least Recently Used items evicted when cache is full
Consistent hashing: Within a PoP, requests are routed to specific servers based on content hash → avoids duplicate caching

Origin Shield (Mid-Tier Cache)

Why: Without it, 300 PoPs each cache-miss independently → origin gets hammered with 300 requests for the same content
How: Intermediate cache between PoPs and origin. All PoPs in a region route cache misses through the shield
Benefit: Origin sees 3-5 cache-miss requests instead of 300
Placement: 3-5 regional shields (US-East, US-West, Europe, Asia)

Cache Key

Default cache key:  scheme + host + path + query_string
  https://cdn.example.com/images/logo.png?v=2

Customizable:
  - Ignore query string (for static assets)
  - Include cookies (for personalized content)
  - Include headers (Accept-Encoding, Accept-Language)
  - Include device type (mobile vs desktop)

Cache Control Headers

HTTP

Cache-Control: public, max-age=86400, s-maxage=604800
  public:    CDN can cache
  max-age:   Browser cache TTL (1 day)
  s-maxage:  CDN/shared cache TTL (7 days)

Cache-Control: private, no-store
  Don't cache at CDN (personalized content)

Vary: Accept-Encoding
  Cache different versions for gzip vs brotli

Event Bus Design (Kafka)

Topic: cdn-events
  Partitions: 64 (scale consumers horizontally)
  Partition key: entity_id (user_id / order_id — preserves per-entity ordering)
  Retention: 7 days (compliance) or 24h (high-volume telemetry)
  Replication factor: 3, min.insync.replicas: 2

Producer: idempotent producer enabled (enable.idempotence=true)
Consumer: consumer group "cdn-processors"
  - At-least-once delivery + idempotent handlers (dedup by event_id)
  - DLQ topic: cdn-events-dlq (poison messages after 3 retries)
  - Lag alert: consumer lag > 60s → scale workers

Design a Content Delivery Network (CDN): async side effects MUST NOT block the synchronous API response.
  Sync path: validate → persist source of truth → publish event → return 201
  Async path: consumers update caches, indexes, notifications, aggregates

Cache Purge

HTTP

POST /api/v1/purge
{
  "urls": ["https://cdn.example.com/images/logo.png"],
  "pattern": "https://cdn.example.com/css/*"
}
Response: 202 Accepted
{
  "purge_id": "purge-uuid",
  "status": "propagating",
  "estimated_completion": "30 seconds"
}

Cache Warm

HTTP

POST /api/v1/warm
{
  "urls": ["https://cdn.example.com/videos/new-release.mp4"],
  "regions": ["us-east", "eu-west", "ap-south"]
}

Get Analytics

HTTP

GET /api/v1/analytics?domain=cdn.example.com&period=24h
Response: 200 OK
{
  "total_requests": 50000000,
  "cache_hit_ratio": 0.93,
  "bandwidth_gb": 15000,
  "latency_p50_ms": 12,
  "latency_p99_ms": 45,
  "by_region": [...]
}

Common Error Responses

400 Bad Request: invalid input, missing fields, or malformed JSON
401 Unauthorized: missing or invalid auth token or API key
403 Forbidden: authenticated but insufficient permissions
404 Not Found: resource ID does not exist
409 Conflict: duplicate write or version conflict; retry with idempotency key
422 Unprocessable Entity: valid syntax but invalid business logic
429 Too Many Requests: rate limit exceeded; honor Retry-After header
500 Internal Error: unexpected server fault; retry with idempotency key
503 Service Unavailable: dependency down or overloaded; use exponential backoff

Edge Server Cache Entry

Cache Key:  "https://cdn.example.com/images/logo.png"
Metadata:
  content_type:   "image/png"
  content_length: 45678
  etag:           "abc123"
  last_modified:  "2026-03-13T00:00:00Z"
  cache_control:  "public, max-age=86400"
  expires_at:     "2026-03-14T00:00:00Z"
  created_at:     "2026-03-13T00:00:00Z"
  hit_count:      1523
  last_accessed:  "2026-03-13T10:30:00Z"
Body:
  [binary content stored in SSD/memory]

DNS Routing Table

Region    PoP           IP Addresses         Health
US-East   NYC           [203.0.113.1, ...]   healthy
US-East   IAD           [203.0.113.5, ...]   healthy
EU-West   LDN           [198.51.100.1, ...]  healthy
AP-South  MUM           [192.0.2.1, ...]     degraded

Purge Propagation

JSON

Kafka Topic: cache-purge
{
  "purge_id": "uuid",
  "pattern": "https://cdn.example.com/css/*",
  "initiated_at": "2026-03-13T10:00:00Z",
  "target_pops": ["all"]
}

Concern	Solution
PoP failure	DNS/Anycast routes to next closest PoP. Health checks every 10s
Edge server failure	Load balancer within PoP routes to healthy servers; consistent hashing rebalances
Origin failure	Serve stale content from cache (stale-while-revalidate, stale-if-error directives)
Cache stampede	Request coalescing: only one request to origin; all other waiters served from the same response
DDoS at edge	Rate limiting, WAF rules, TCP SYN cookies, challenge pages (CAPTCHA) at edge
Cable cut (region offline)	Anycast reroutes globally; regional failover to adjacent PoPs

Specific: Cache Stampede / Thundering Herd

When a popular cached item expires, hundreds of concurrent requests trigger simultaneous origin fetches:

Request coalescing: First request triggers origin fetch; subsequent requests wait for the result
Stale-while-revalidate: Serve stale content while fetching fresh content in background
Jittered TTL: Add random ±10% jitter to TTL → different PoPs expire at different times

Push vs Pull CDN

Pull CDN: Edge fetches from origin on first request (cache miss): best for dynamic sites, user-generated content.

Push CDN: Origin proactively pushes content to edge servers: best for known popular content (video, software updates).

Multi-CDN Strategy

Use multiple CDN providers (Akamai + CloudFront + Fastly)
Benefits: Redundancy, cost optimization, best performance per region
Real-time switching: Traffic management layer routes to the best-performing CDN per user

Edge Computing

Run application logic at the edge (Cloudflare Workers, AWS Lambda@Edge)
Use cases: A/B testing, header manipulation, authentication, image resizing, personalization
Reduces round trips to origin

TLS at Edge

SSL/TLS terminated at edge server (not origin)
Certificate management: Automatic certificate provisioning (Let's Encrypt) or customer certificate
Edge-to-origin connection: Can be HTTP (internal network) or HTTPS (security)

Image Optimization at Edge

Automatic format conversion (WebP/AVIF for supported browsers)
Responsive sizing (resize based on Accept header or query param)
Quality adjustment based on network speed
Reduces bandwidth by 30-50%

Monitoring

Real-time dashboards: requests/sec, bandwidth, cache hit ratio, error rate by PoP
Alert on: cache hit ratio < 80%, origin error rate > 5%, latency p99 > 100ms
Origin health monitoring: synthetic probes from each region

Interview Walkthrough

Start with the read-heavy traffic profile and why origin offload is the primary goal — use Interview Patterns for cache hierarchy.
Explain DNS-based vs Anycast routing to nearest PoP and how TTL affects failover during origin outages.
Walk through cache key design (URL + Vary headers + query string policy) and negative caching for 404s.
Cover cache invalidation strategies: TTL-only, active purge API, and versioned asset URLs for immutable content.
Discuss origin shield (regional mid-tier cache) to collapse thundering herd on cache miss.
Common pitfall: caching personalized HTML at the edge — without Vary or edge-side includes, users see each other's data.

SLOs & Error Budgets

Metric	Target	Rationale
Edge response p99 latency	< 50ms	Cache HIT path — no origin round-trip
Global cache hit ratio	> 95%	Origin cost and resilience
Purge propagation p99	< 30 sec	Compliance and content freshness
PoP availability	99.99%	Anycast failover should mask single-PoP failure

Incident Scenarios (2am reality)

Scenario	How you detect	Mitigation
Origin overload during coordinated TTL expiry	Origin 5xx rate spikes at top of every hour; shield miss ratio jumps to 40%	Enable stale-while-revalidate; jitter TTL per object (+/- 10%); increase shield singleflight timeout; emergency TTL extension via control plane
Bad config push clears all edge caches	Global hit ratio drops to 0%; origin bandwidth instant 100× spike	Config rollback via versioned deployments; canary PoP before global push; origin rate limiting + auto-scale; shield request coalescing
Cache poisoning via Host header manipulation	Users report seeing another tenant's content; security scan flags cross-tenant cache key collision	Include tenant_id in cache key namespace; validate Host header against tenant config; emergency flush of affected PoP partition

Cost Drivers (Staff lens)

Egress bandwidth: 10 Tbps peak × $0.01–0.05/GB depending on region — multi-billion $/year at scale
PoP infrastructure: 200 PoPs × 50 servers × SSD cache — CapEx + colo fees dominate fixed cost
Origin offload value: each 1% hit ratio improvement saves ~100 Gbps origin egress

Multi-Region & DR

PoPs are inherently multi-region. Origins may be multi-region with geo-aware shield routing (EU edges → EU origin). Cross-region cache fill via private backbone avoids public internet for inter-PoP transfer. Control plane active-active with CRDT-based config merge for partition tolerance.