This problem appears in multiple sheets. Depth expectations increase as you progress:
| Track | What to demonstrate |
|---|---|
| Arch 75 | Staff level: multi-region, cost at scale, migration path, and production metrics. |
Interview Prompt
Design Design Instagram (Photo Sharing + Social Features).
Clarifying Questions (ask before designing)
| Question | Why it matters |
|---|---|
| Fan-out on write vs read vs hybrid? Any celebrity users? | The defining trade-off for social feed architecture. |
| What scale should we design for — DAU, QPS, data volume? | Drives every capacity decision; shows structured thinking. |
| What are the read vs write patterns on the critical path? | Determines caching, DB choice, and replication topology. |
| What consistency and durability guarantees are required? | Separates strong-consistency paths from eventual ones — a senior differentiator. |
Scope
In scope
- Photo upload pipeline
- CDN + S3 storage
- Explore/discovery feed
- Story rendering
- Hashtag search
- Capacity estimation with shown math
Out of scope (state explicitly)
- Full ML ranking model training pipeline
- Direct messaging / chat (#07)
- Ad insertion and monetization
Assumptions
- Millions of DAU with heavy fan-out — clarify celebrity/hot-key cases early
- Eventual consistency acceptable for non-critical side effects (counts, notifications)
- WebSocket or push infrastructure available at the edge
These foundational concepts underpin the patterns used in this problem. Review them before deep-diving into component-level trade-offs.
- Upload photos/videos with captions, tags, location, filters
- News feed: Personalized feed of posts from followed users
- Stories: Ephemeral content that disappears after 24 hours
- Follow / Unfollow users
- Like, Comment, Save posts
- Explore page: Discover new content based on interests
- Direct Messaging (DMs)
- User profiles with grid of posts, follower/following counts
- Hashtags and location-based discovery
- Notifications for likes, comments, follows, mentions
- High Availability: 99.99%
- Low Latency: Feed loads in < 200 ms, image renders in < 500 ms
- Scalability: 2B+ registered users, 500M+ DAU
- Durability: Uploaded photos/videos must never be lost
- Read-Heavy: 100:1 read to write ratio
- Eventual Consistency: OK for feed, likes count, follower count
- Global: CDN for media delivery worldwide
| Metric | Calculation | Value |
|---|---|---|
| DAU | Given (product assumption) | 500M |
| Photo uploads / day | 500M DAU × 0.2 | 100M |
| Avg photo size (original) | Given (typical workload assumption) | 3 MB |
| Photo storage / day | 100M × 3 MB | 300 TB |
| Photo storage / year | Given | ~110 PB |
| Resized versions per photo | Given (assumption documented in value) | 4 (thumbnail, small, medium, large) |
| Total storage with resizes | 300 TB × 2 | 600 TB/day |
| Feed reads / day | 500M DAU × 10 | 5B |
| Stories uploads / day | 500M DAU × 1 | 500M |
Post Service
- Handles photo/video upload and post creation
- Upload Flow:
- Client requests a pre-signed S3 URL from Post Service
- Client uploads media directly to S3 (avoids routing through our servers)
- Client sends post metadata (caption, tags, location) to Post Service
- Post Service writes to Posts DB
- Publishes
post-createdevent to Kafka
Media Processing Pipeline
- Triggered by: S3 event notification or Kafka event
- Processing steps:
- Resize: Generate 4 versions (150×150 thumbnail, 320×320, 640×640, 1080×1080)
- Apply filters: If user selected a filter, apply server-side (or client-side pre-upload)
- Generate blurhash: Low-res placeholder for progressive loading
- Extract EXIF: GPS coordinates, camera info (strip sensitive data)
- Content moderation: NSFW detection, violence detection (ML model)
- Video processing: Transcode to multiple resolutions (360p, 720p, 1080p), generate thumbnails
- Technology: AWS Lambda for image processing, FFmpeg for video, GPU instances for ML
- Output: Multiple resized images stored back in S3, CDN URLs generated
Feed Service (Hybrid Fan-Out)
- Same hybrid approach as Twitter/Facebook (see News Feed design)
- Normal users: fan-out on write to Redis
- Celebrities: fan-out on read
- Feed ranking: ML model considers:
- Relationship strength (how often you interact with the poster)
- Post recency
- Engagement velocity (how fast it's getting likes)
- Content type (photos vs. videos vs. carousels)
- User's historical preferences
Story Service
- 24-hour ephemeral content
- Storage: S3 with TTL (lifecycle policy deletes after 24 hours)
- Serving: Stories for users you follow are pre-fetched
- Data model: Cassandra with TTL = 86400 seconds
- Viewing: Stories are displayed in a horizontal carousel, ordered by recency and relationship strength
- Story tray: Pre-computed list of users who have active stories → stored in Redis
Explore Service
- Purpose: Surface interesting content from users you don't follow
- How:
- Collaborative filtering: "Users similar to you liked these posts"
- Content-based: Posts with hashtags/locations matching your interests
- Engagement signals: High like velocity posts
- Implementation:
- Offline: Spark job computes candidate posts per interest cluster
- Online: ML ranking model scores candidates for each user
- Cached in Redis with TTL
Social Service (Follow, Like, Comment)
- Follow: Write to MySQL (source of truth for social graph) + update Redis follower sets + publish
follow-eventto Kafka - Like: Atomic check-and-set in Redis (
SADD liked:{post_id} {user_id}) + increment counter (INCR like_count:{post_id}) + Kafka event for async DB write + notification - Comment: Write to Cassandra (partition key = post_id, clustering = comment_id for time ordering) + Kafka event → Notification Service
- Dedup: Redis SET for
liked:{post_id}prevents double-likes; idempotent on retry - Anti-spam: Rate limit comments (max 20/min per user), ML-based spam classifier on comment text
Notification Service
- Consumes events from Kafka (
like-events,follow-events,comment-events,mention-events) - Channels: APNs (iOS push), FCM (Android push), in-app notification feed
- Batching: Multiple likes on the same post → collapse into "Alice and 42 others liked your photo" (not 43 separate pushes)
- Notification feed: Stored in Cassandra (partition key = user_id, ordered by timestamp): the "Activity" tab
- Rate limiting: Max 1 push per post per 5-minute window for the same event type
Search Service (Elasticsearch)
- Indexed entities: Users (username, full name, bio), Hashtags (#sunset → post count, trending score), Locations (place name, coordinates)
- Features: Prefix autocomplete on usernames, fuzzy matching, trending hashtags
- Ranking: For user search: verified badge boost + follower count + mutual followers. For hashtag search: post count + trending velocity
- Sync: User profile changes → Kafka CDC → Elasticsearch consumer updates index
- Note: Post content (captions) are NOT full-text searchable (Instagram design choice: discovery is via hashtags and Explore, not text search)
Analytics Pipeline
- Consumes all Kafka event streams → Flink aggregation → ClickHouse
- Metrics: Post engagement rates, story completion rates, follower growth, creator analytics dashboard
- Used by: Explore ranking model training, content moderation signal enrichment, ad targeting
Upload Photo
POST /api/v1/posts
{
"media_ids": ["media-uuid-1"],
"caption": "Beautiful sunset! #nature",
"location": {"lat": 37.7749, "lng": -122.4194, "name": "San Francisco"},
"tagged_users": ["user-456"],
"filter": "clarendon"
}
Response: 201 CreatedGet Pre-signed Upload URL
GET /api/v1/media/upload-url?type=image&content_type=image/jpeg
Response: 200 OK
{
"upload_url": "https://s3.amazonaws.com/instagram-media/...",
"media_id": "media-uuid-1"
}Get Feed
GET /api/v1/feed?cursor={post_id}&limit=10Get Stories
GET /api/v1/stories/feed
Response: 200 OK
{
"story_trays": [
{
"user": {"user_id": "...", "username": "...", "avatar_url": "..."},
"stories": [
{"story_id": "...", "media_url": "...", "created_at": "...", "expires_at": "..."}
]
}
]
}Post Story
POST /api/v1/stories
{
"media_id": "media-uuid",
"stickers": [...],
"music_id": "..."
}Like / Comment
POST /api/v1/posts/{post_id}/like
POST /api/v1/posts/{post_id}/comments
{ "text": "Amazing photo!" }Common Error Responses
400 Bad Request: invalid input, missing fields, or malformed JSON 401 Unauthorized: missing or invalid auth token or API key 403 Forbidden: authenticated but insufficient permissions 404 Not Found: resource ID does not exist 409 Conflict: duplicate write or version conflict; retry with idempotency key 422 Unprocessable Entity: valid syntax but invalid business logic 429 Too Many Requests: rate limit exceeded; honor Retry-After header 500 Internal Error: unexpected server fault; retry with idempotency key 503 Service Unavailable: dependency down or overloaded; use exponential backoff
Cassandra: Posts
CREATE TABLE posts (
post_id BIGINT, -- Snowflake ID
user_id UUID,
caption TEXT,
media_urls LIST<TEXT>, -- CDN URLs for different sizes
location TEXT,
hashtags SET<TEXT>,
tagged_users SET<UUID>,
like_count COUNTER,
comment_count COUNTER,
created_at TIMESTAMP,
PRIMARY KEY (post_id)
);
-- User's own posts (profile grid)
CREATE TABLE user_posts (
user_id UUID,
post_id BIGINT,
media_thumb TEXT,
created_at TIMESTAMP,
PRIMARY KEY (user_id, post_id)
) WITH CLUSTERING ORDER BY (post_id DESC);Cassandra: Stories (with TTL)
CREATE TABLE stories (
user_id UUID,
story_id BIGINT,
media_url TEXT,
media_type VARCHAR, -- image, video
created_at TIMESTAMP,
PRIMARY KEY (user_id, story_id)
) WITH CLUSTERING ORDER BY (story_id DESC)
AND default_time_to_live = 86400;Redis: Feed Cache
Key: feed:{user_id}
Type: Sorted Set
Members: post_id
Scores: ranking_score (not just timestamp — ML-ranked)
Max: 500 entriesRedis: Story Tray
Key: story_tray:{user_id}
Type: Sorted Set
Members: poster_user_id
Scores: latest_story_timestamp
TTL: 3600 (refresh hourly)MySQL: Users & Social Graph
CREATE TABLE users (
user_id UUID PRIMARY KEY,
username VARCHAR(30) UNIQUE,
display_name VARCHAR(64),
bio TEXT,
avatar_url TEXT,
post_count INT DEFAULT 0,
follower_count INT DEFAULT 0,
following_count INT DEFAULT 0,
is_private BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP
);
CREATE TABLE follows (
follower_id UUID,
followee_id UUID,
status ENUM('active', 'pending'),
created_at TIMESTAMP,
PRIMARY KEY (follower_id, followee_id),
INDEX idx_followee (followee_id)
);S3: Media Storage Structure
Bucket: instagram-media
Path: /{user_id}/{year}/{month}/{media_id}/
Files: original.jpg
thumb_150.jpg
small_320.jpg
medium_640.jpg
large_1080.jpg
blurhash.txtGeneral
| Concern | Solution |
|---|---|
| Media loss | S3 with 11 nines durability, cross-region replication |
| Upload failure | Client retries with idempotent media_id; S3 multipart upload for large files |
| Feed cache loss | Rebuild from DB; degrade to reverse-chronological (no ranking) |
| Story expiration accuracy | Cassandra TTL handles deletion; client also checks expires_at |
| CDN cache invalidation | On post delete, purge CDN cache by URL pattern |
| Media processing failure | DLQ for failed processing jobs; retry with different worker |
Specific: Handling Image Upload Failures
- Client uploads to S3 using multipart upload (resumable)
- If upload fails mid-way, client retries from last successful part
- If post metadata is saved but media processing fails → post is in "processing" state
- Background retry processes failed media
- After 3 failures → notify user to re-upload
Progressive Image Loading
- First: Show blurhash placeholder (tiny hash → instant blur preview)
- Then: Load thumbnail (150px)
- Then: Load appropriate resolution based on device/viewport
- Uses
srcsetandsizesattributes for responsive images
Content Moderation Pipeline
Upload → NSFW Detection (ML) → Hate Speech (NLP) ? → Score > threshold → Auto-reject + notify user → Score borderline → Queue for human review → Score OK → Publish
Hashtag and Location Pages
- Hashtag page: Elasticsearch query for posts with specific hashtag
- Location page: Geospatial query (PostGIS or Elasticsearch geo_point)
- Both sorted by recency or "top" (engagement-based)
Private Accounts
- Follow requires approval (status = 'pending')
- Posts only visible to approved followers
- Feed fan-out only to approved followers
- Explore page excludes private account posts
Instagram Reels (Video Feed)
- Separate vertical video feed (like TikTok)
- Video transcoded to HLS/DASH adaptive streaming
- Recommendation engine: engagement-based + content understanding (video embeddings)
- Pre-fetch next 3 reels for smooth scrolling experience
Feed Ranking Model: ML Deep Dive
Instagram's feed is NOT chronological. It's ranked by predicted engagement.
Features fed to the ranking model:
User-author affinity:
- interaction_score: how often user likes/comments author's posts (decay over time)
- profile_visit_count: how often user visits author's profile
- DM frequency: users who DM each other see each other's posts first
Post features:
- age_minutes: exponential decay (post from 1 hr ago >> post from 24 hrs ago)
- content_type: photo / video / carousel (video gets ~1.3x implicit boost)
- engagement_velocity: likes_in_first_30_min / impressions_in_first_30_min
- caption_length, hashtag_count, has_location
Context features:
- time_of_day (user's local time), day_of_week
- session_number_today: 1st open = best content; 5th open = deeper inventory
- network_quality: on slow network, deprioritize video
Model: Multi-task learning — predict P(like), P(comment), P(save), P(share), P(dwell_time > 3s)
Final score = w1*P(like) + w2*P(comment) + w3*P(save) + w4*P(share) + w5*P(dwell)
Weights: save and share are weighted highest (stronger engagement signals than likes)
Serving: candidate generation (500 posts from fan-out cache) → ML ranking → top 50 served
Latency budget: < 100 ms for scoring 500 candidates (batched inference, ONNX on CPU)
Diversity injection: after ranking, ensure no 3 consecutive posts from same authorRace Condition: Post Visible Before Media Processed
Problem: User creates post → metadata saved to DB → fan-out starts → followers see post But media is still processing (resize, filter, moderation). Followers see broken image. Solution: Two-phase publishing Phase 1: Upload media + process (resize, moderate). Post status = "processing" Phase 2: Only after ALL media variants ready → set status = "published" → start fan-out Client shows spinner until post is "published" (typically 3-8 seconds). Fan-out never triggers for "processing" posts → followers never see broken images.
Interview Walkthrough
- Separate media upload (async, multi-resolution transcoding) from feed fan-out — never fan-out until all variants are ready.
- Use object storage (S3) for originals and CDN for delivery; discuss thumbnail, feed, and full-resolution pipelines.
- Apply the same hybrid fan-out strategy as Twitter: push for normal accounts, pull-merge for celebrity/influencer posts.
- Store the social graph in a graph DB or adjacency lists sharded by user_id; feed generation reads follower lists from cache.
- Cover Stories separately: ephemeral TTL in Redis, separate CDN paths, no fan-out to main feed.
- Quantify storage with Back-of-the-Envelope Estimation: 100M photos/day × 2 MB average = 200 TB/day ingest before compression tiers.
- Common pitfall: triggering fan-out before transcoding completes, showing broken or single-resolution images to millions of followers.
Fan-Out on Write vs Fan-Out on Read for Feed Generation
Instagram's core design decision: how to deliver posts to followers' feeds?
Fan-Out on Write (push model):
When Alice posts → immediately write Alice's post_id to every follower's feed cache
✓ Feed reads are O(1): just read pre-populated feed list
✓ Consistent feed across devices (pre-computed, same order)
✗ Write amplification: Alice has 100M followers → 100M Redis writes per post
✗ Celebrities cause thundering herd on write path
✗ Wasted work: most followers won't see the post in their next session
Fan-Out on Read (pull model):
When Bob opens his feed → fetch latest posts from all 500 users he follows
✓ No write amplification
✓ Always fresh (no stale cached feed)
✗ Read fan-out: 500 followees × DB query per = 500 queries per feed load
✗ Latency: N queries serialized or N parallel with fan-out overhead
Instagram's Hybrid Approach ⭐:
Regular users (< 50K followers): Fan-out on WRITE
Post → async worker → write post_id to Redis feed:{follower_id} for each follower
Each follower's feed is a sorted set of post_ids
Cost: manageable for typical users (200-1000 followers)
Celebrities (> 50K followers): Fan-out on READ
Don't pre-compute their posts in follower feeds
On feed request: fetch 500 posts from each followed celebrity (pull)
Feed assembly on read:
pre_computed_feed = Redis: feed:{user_id} (from regular users)
celebrity_posts = merge(query latest 50 posts from each followed celeb)
final_feed = merge_sorted(pre_computed_feed, celebrity_posts)
→ ML rank → return top 50
This is called the "Hybrid Fan-out" and is used by Instagram, Twitter, and Facebook.Photo Storage Optimization: CDN Tiering and Image Compression
Instagram serves 100B+ photos/day across multiple resolutions.
Storage and CDN bandwidth are the largest cost drivers.
Resolution variants stored per photo:
thumbnail: 150×150 (for feed grid, 2-3 KB)
low_res: 480×480 (feed display on mobile, 20-40 KB)
standard: 1080×1080 (full view, 100-200 KB)
original: as uploaded (up to 10 MB, never served directly)
Storage tiers:
Hot (CDN edge): thumbnails + low_res for recent posts (last 30 days)
→ ~80% of serves come from recent content
Cost: higher per GB but zero origin load
Warm (S3 Standard): standard + all sizes for 30-180 day old posts
→ Served via CDN on demand, cached on first access
Cold (S3 Glacier): original files + all sizes for > 180 days old
→ Retrieved on-demand with 1-12 hour delay (acceptable for old content)
→ 70% cheaper than S3 Standard per GB
Compression strategy:
JPEG → WebP: 30-40% smaller at same visual quality
Lossy compression: quality=85 for thumbnails, quality=92 for standard
Progressive JPEG: browser shows low-quality preview while downloading
Result: 40% reduction in CDN egress cost
CDN cache hit rate: ~95% for thumbnails, ~80% for standard resolution
(recently uploaded photos by celebrities get >1B serves from CDN, zero S3 cost)Explore Page: Content Discovery via Graph Signals
Instagram Explore shows content from people you DON'T follow.
Goal: maximize engagement while introducing users to new creators.
Signal types:
1. Interest graph (most important):
Cluster users by engagement patterns
"Users like Alice (who likes dog photos) also engage with these accounts"
Graph-based clustering → interest communities
2. Content understanding:
Computer vision: classify photo content (food, fashion, travel, etc.)
Match user's historically engaged categories
3. Social graph proximity:
Posts liked by people I follow → second-degree signal
"12 people you follow saved this post" → strong interest signal
4. Trending:
Posts with unusually high engagement velocity in past 1 hour
ZADD trending:explore post_id {velocity_score}
Candidate generation → ranking pipeline:
1. Retrieve ~10K candidates from interest graph + social graph + trending
2. ML ranking model (same as feed): predict P(like), P(save), P(follow)
3. Diversity enforcement: max 1 post per creator in top 50
4. Safety filter: verified brand-safe content only for non-personalized positions
5. Novelty boost: content user hasn't seen before gets a slight rank boost
Key difference from Feed:
Feed: optimize for engagement with KNOWN creators (satisfaction)
Explore: optimize for engagement + creator DISCOVERY (growth)
Explore uses slightly higher novelty weight to surface new creatorsStaff interviews expect you to articulate how the system evolves under real growth — not jump straight to the final architecture.
Phase 1: MVP (0 to 100K users)
Monolith or minimal services proving core instagram flows. Optimize for shipping speed and correctness over scale.
Key components: Single region · Primary DB + Redis cache · Synchronous core path · Basic monitoring
Move to next phase when: p99 latency exceeds SLO or DB CPU sustained above 70%
Phase 2: Growth (100K to 10M users)
Split read/write paths, introduce async processing for non-critical work, add caching layers and horizontal scaling.
Key components: Read replicas or CQRS · Message queue for async work · CDN / edge caching · Service-level SLOs
Move to next phase when: Hot keys, fan-out bottlenecks, or ops toil from manual scaling
Phase 3: Scale (10M+ users)
Shard data plane, multi-region active-active or active-passive, formal DR runbooks, cost optimization.
Key components: Database sharding / partitioning · Multi-region replication · Auto-scaling + chaos testing · Dedicated platform/SRE ownership
Move to next phase when: Regional failure domain risk, compliance data residency, or linear cost growth unsustainable
SLOs & Error Budgets
| Metric | Target | Rationale |
|---|---|---|
| Core user-facing availability | 99.95% | Budget for planned maintenance + unplanned failures without user-visible outage. |
| p99 latency (critical path) | Problem-specific — state target early and tie to capacity math | Interview credibility comes from connecting SLO to architecture choices. |
| Error rate (5xx) | < 0.1% | Distinguishes transient blips from systemic failure requiring rollback. |
| Data durability | 99.999999999% (11 nines) for committed writes | Define which operations require fsync/quorum vs async replication. |
Incident Scenarios (2am reality)
| Scenario | How you detect | Mitigation |
|---|---|---|
| Primary database unavailable | Health check failures, connection pool exhaustion alerts, elevated 5xx | Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists |
| Traffic spike (10× normal) | RPS anomaly alert, autoscaling lag, latency SLO burn rate | Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations |
| Bad deploy causing elevated errors | Canary metric regression, error budget burn, deployment correlation | Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility |
Cost Drivers (Staff lens)
- Egress bandwidth and CDN (often dominates media/data-heavy systems)
- Database storage + IOPS at scale (plan compaction, TTL, tiering)
- Compute for async pipelines (right-size workers, spot instances for batch)
- Managed service premiums vs operational headcount trade-off
Multi-Region & DR
Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.