Design Instagram (Photo Sharing + Social Features) – System Design Walkthrough

This problem appears in multiple sheets. Depth expectations increase as you progress:

Track	What to demonstrate
Arch 75	Staff level: multi-region, cost at scale, migration path, and production metrics.

Interview Prompt

Design Design Instagram (Photo Sharing + Social Features).

Clarifying Questions (ask before designing)

Question	Why it matters
Fan-out on write vs read vs hybrid? Any celebrity users?	The defining trade-off for social feed architecture.
What scale should we design for — DAU, QPS, data volume?	Drives every capacity decision; shows structured thinking.
What are the read vs write patterns on the critical path?	Determines caching, DB choice, and replication topology.
What consistency and durability guarantees are required?	Separates strong-consistency paths from eventual ones — a senior differentiator.

Scope

In scope

Photo upload pipeline
CDN + S3 storage
Explore/discovery feed
Story rendering
Hashtag search
Capacity estimation with shown math

Out of scope (state explicitly)

Full ML ranking model training pipeline
Direct messaging / chat (#07)
Ad insertion and monetization

Assumptions

Millions of DAU with heavy fan-out — clarify celebrity/hot-key cases early
Eventual consistency acceptable for non-critical side effects (counts, notifications)
WebSocket or push infrastructure available at the edge

Upload photos/videos with captions, tags, location, filters
News feed: Personalized feed of posts from followed users
Stories: Ephemeral content that disappears after 24 hours
Follow / Unfollow users
Like, Comment, Save posts
Explore page: Discover new content based on interests
Direct Messaging (DMs)
User profiles with grid of posts, follower/following counts
Hashtags and location-based discovery
Notifications for likes, comments, follows, mentions

Metric	Calculation	Value
DAU	Given (product assumption)	500M
Photo uploads / day	500M DAU × 0.2	100M
Avg photo size (original)	Given (typical workload assumption)	3 MB
Photo storage / day	100M × 3 MB	300 TB
Photo storage / year	Given	~110 PB
Resized versions per photo	Given (assumption documented in value)	4 (thumbnail, small, medium, large)
Total storage with resizes	300 TB × 2	600 TB/day
Feed reads / day	500M DAU × 10	5B
Stories uploads / day	500M DAU × 1	500M

Post Service

Handles photo/video upload and post creation
Upload Flow:
1. Client requests a pre-signed S3 URL from Post Service
2. Client uploads media directly to S3 (avoids routing through our servers)
3. Client sends post metadata (caption, tags, location) to Post Service
4. Post Service writes to Posts DB
5. Publishes post-created event to Kafka

Media Processing Pipeline

Triggered by: S3 event notification or Kafka event
Processing steps:
1. Resize: Generate 4 versions (150×150 thumbnail, 320×320, 640×640, 1080×1080)
2. Apply filters: If user selected a filter, apply server-side (or client-side pre-upload)
3. Generate blurhash: Low-res placeholder for progressive loading
4. Extract EXIF: GPS coordinates, camera info (strip sensitive data)
5. Content moderation: NSFW detection, violence detection (ML model)
6. Video processing: Transcode to multiple resolutions (360p, 720p, 1080p), generate thumbnails
Technology: AWS Lambda for image processing, FFmpeg for video, GPU instances for ML
Output: Multiple resized images stored back in S3, CDN URLs generated

Feed Service (Hybrid Fan-Out)

Same hybrid approach as Twitter/Facebook (see News Feed design)
Normal users: fan-out on write to Redis
Celebrities: fan-out on read
Feed ranking: ML model considers:
- Relationship strength (how often you interact with the poster)
- Post recency
- Engagement velocity (how fast it's getting likes)
- Content type (photos vs. videos vs. carousels)
- User's historical preferences

Story Service

24-hour ephemeral content
Storage: S3 with TTL (lifecycle policy deletes after 24 hours)
Serving: Stories for users you follow are pre-fetched
Data model: Cassandra with TTL = 86400 seconds
Viewing: Stories are displayed in a horizontal carousel, ordered by recency and relationship strength
Story tray: Pre-computed list of users who have active stories → stored in Redis

Explore Service

Purpose: Surface interesting content from users you don't follow
How:
1. Collaborative filtering: "Users similar to you liked these posts"
2. Content-based: Posts with hashtags/locations matching your interests
3. Engagement signals: High like velocity posts
Implementation:
- Offline: Spark job computes candidate posts per interest cluster
- Online: ML ranking model scores candidates for each user
- Cached in Redis with TTL

Social Service (Follow, Like, Comment)

Follow: Write to MySQL (source of truth for social graph) + update Redis follower sets + publish follow-event to Kafka
Like: Atomic check-and-set in Redis (SADD liked:{post_id} {user_id}) + increment counter (INCR like_count:{post_id}) + Kafka event for async DB write + notification
Comment: Write to Cassandra (partition key = post_id, clustering = comment_id for time ordering) + Kafka event → Notification Service
Dedup: Redis SET for liked:{post_id} prevents double-likes; idempotent on retry
Anti-spam: Rate limit comments (max 20/min per user), ML-based spam classifier on comment text

Notification Service

Consumes events from Kafka (like-events, follow-events, comment-events, mention-events)
Channels: APNs (iOS push), FCM (Android push), in-app notification feed
Batching: Multiple likes on the same post → collapse into "Alice and 42 others liked your photo" (not 43 separate pushes)
Notification feed: Stored in Cassandra (partition key = user_id, ordered by timestamp): the "Activity" tab
Rate limiting: Max 1 push per post per 5-minute window for the same event type

Search Service (Elasticsearch)

Indexed entities: Users (username, full name, bio), Hashtags (#sunset → post count, trending score), Locations (place name, coordinates)
Features: Prefix autocomplete on usernames, fuzzy matching, trending hashtags
Ranking: For user search: verified badge boost + follower count + mutual followers. For hashtag search: post count + trending velocity
Sync: User profile changes → Kafka CDC → Elasticsearch consumer updates index
Note: Post content (captions) are NOT full-text searchable (Instagram design choice: discovery is via hashtags and Explore, not text search)

Analytics Pipeline

Consumes all Kafka event streams → Flink aggregation → ClickHouse
Metrics: Post engagement rates, story completion rates, follower growth, creator analytics dashboard
Used by: Explore ranking model training, content moderation signal enrichment, ad targeting

Upload Photo

HTTP

POST /api/v1/posts
{
  "media_ids": ["media-uuid-1"],
  "caption": "Beautiful sunset! #nature",
  "location": {"lat": 37.7749, "lng": -122.4194, "name": "San Francisco"},
  "tagged_users": ["user-456"],
  "filter": "clarendon"
}
Response: 201 Created

Get Pre-signed Upload URL

HTTP

GET /api/v1/media/upload-url?type=image&content_type=image/jpeg
Response: 200 OK
{
  "upload_url": "https://s3.amazonaws.com/instagram-media/...",
  "media_id": "media-uuid-1"
}

Get Feed

HTTP

GET /api/v1/feed?cursor={post_id}&limit=10

Get Stories

HTTP

GET /api/v1/stories/feed
Response: 200 OK
{
  "story_trays": [
    {
      "user": {"user_id": "...", "username": "...", "avatar_url": "..."},
      "stories": [
        {"story_id": "...", "media_url": "...", "created_at": "...", "expires_at": "..."}
      ]
    }
  ]
}

Post Story

HTTP

POST /api/v1/stories
{
  "media_id": "media-uuid",
  "stickers": [...],
  "music_id": "..."
}

Like / Comment

HTTP

POST /api/v1/posts/{post_id}/like
POST /api/v1/posts/{post_id}/comments
{ "text": "Amazing photo!" }

Common Error Responses

400 Bad Request: invalid input, missing fields, or malformed JSON
401 Unauthorized: missing or invalid auth token or API key
403 Forbidden: authenticated but insufficient permissions
404 Not Found: resource ID does not exist
409 Conflict: duplicate write or version conflict; retry with idempotency key
422 Unprocessable Entity: valid syntax but invalid business logic
429 Too Many Requests: rate limit exceeded; honor Retry-After header
500 Internal Error: unexpected server fault; retry with idempotency key
503 Service Unavailable: dependency down or overloaded; use exponential backoff

Cassandra: Posts

SQL

CREATE TABLE posts (
    post_id         BIGINT,         -- Snowflake ID
    user_id         UUID,
    caption         TEXT,
    media_urls      LIST<TEXT>,     -- CDN URLs for different sizes
    location        TEXT,
    hashtags        SET<TEXT>,
    tagged_users    SET<UUID>,
    like_count      COUNTER,
    comment_count   COUNTER,
    created_at      TIMESTAMP,
    PRIMARY KEY (post_id)
);

-- User's own posts (profile grid)
CREATE TABLE user_posts (
    user_id     UUID,
    post_id     BIGINT,
    media_thumb TEXT,
    created_at  TIMESTAMP,
    PRIMARY KEY (user_id, post_id)
) WITH CLUSTERING ORDER BY (post_id DESC);

Cassandra: Stories (with TTL)

SQL

CREATE TABLE stories (
    user_id     UUID,
    story_id    BIGINT,
    media_url   TEXT,
    media_type  VARCHAR,          -- image, video
    created_at  TIMESTAMP,
    PRIMARY KEY (user_id, story_id)
) WITH CLUSTERING ORDER BY (story_id DESC)
  AND default_time_to_live = 86400;

Redis: Feed Cache

Key:    feed:{user_id}
Type:   Sorted Set
Members: post_id
Scores:  ranking_score (not just timestamp — ML-ranked)
Max:     500 entries

Redis: Story Tray

Key:    story_tray:{user_id}
Type:   Sorted Set
Members: poster_user_id
Scores:  latest_story_timestamp
TTL:     3600 (refresh hourly)

MySQL: Users & Social Graph

SQL

CREATE TABLE users (
    user_id       UUID PRIMARY KEY,
    username      VARCHAR(30) UNIQUE,
    display_name  VARCHAR(64),
    bio           TEXT,
    avatar_url    TEXT,
    post_count    INT DEFAULT 0,
    follower_count INT DEFAULT 0,
    following_count INT DEFAULT 0,
    is_private    BOOLEAN DEFAULT FALSE,
    created_at    TIMESTAMP
);

CREATE TABLE follows (
    follower_id  UUID,
    followee_id  UUID,
    status       ENUM('active', 'pending'),
    created_at   TIMESTAMP,
    PRIMARY KEY (follower_id, followee_id),
    INDEX idx_followee (followee_id)
);

S3: Media Storage Structure

Bucket: instagram-media
Path:   /{user_id}/{year}/{month}/{media_id}/
Files:  original.jpg
        thumb_150.jpg
        small_320.jpg
        medium_640.jpg
        large_1080.jpg
        blurhash.txt

General

Concern	Solution
Media loss	S3 with 11 nines durability, cross-region replication
Upload failure	Client retries with idempotent media_id; S3 multipart upload for large files
Feed cache loss	Rebuild from DB; degrade to reverse-chronological (no ranking)
Story expiration accuracy	Cassandra TTL handles deletion; client also checks expires_at
CDN cache invalidation	On post delete, purge CDN cache by URL pattern
Media processing failure	DLQ for failed processing jobs; retry with different worker

Specific: Handling Image Upload Failures

Client uploads to S3 using multipart upload (resumable)
If upload fails mid-way, client retries from last successful part
If post metadata is saved but media processing fails → post is in "processing" state
Background retry processes failed media
After 3 failures → notify user to re-upload

Progressive Image Loading

First: Show blurhash placeholder (tiny hash → instant blur preview)
Then: Load thumbnail (150px)
Then: Load appropriate resolution based on device/viewport
Uses srcset and sizes attributes for responsive images

Content Moderation Pipeline

Upload → NSFW Detection (ML) → Hate Speech (NLP) ?
  → Score > threshold → Auto-reject + notify user
  → Score borderline → Queue for human review
  → Score OK → Publish

Hashtag and Location Pages

Hashtag page: Elasticsearch query for posts with specific hashtag
Location page: Geospatial query (PostGIS or Elasticsearch geo_point)
Both sorted by recency or "top" (engagement-based)

Private Accounts

Follow requires approval (status = 'pending')
Posts only visible to approved followers
Feed fan-out only to approved followers
Explore page excludes private account posts

Instagram Reels (Video Feed)

Separate vertical video feed (like TikTok)
Video transcoded to HLS/DASH adaptive streaming
Recommendation engine: engagement-based + content understanding (video embeddings)
Pre-fetch next 3 reels for smooth scrolling experience

Feed Ranking Model: ML Deep Dive

Instagram's feed is NOT chronological. It's ranked by predicted engagement.

Features fed to the ranking model:
  User-author affinity:
    - interaction_score: how often user likes/comments author's posts (decay over time)
    - profile_visit_count: how often user visits author's profile
    - DM frequency: users who DM each other see each other's posts first

  Post features:
    - age_minutes: exponential decay (post from 1 hr ago >> post from 24 hrs ago)
    - content_type: photo / video / carousel (video gets ~1.3x implicit boost)
    - engagement_velocity: likes_in_first_30_min / impressions_in_first_30_min
    - caption_length, hashtag_count, has_location

  Context features:
    - time_of_day (user's local time), day_of_week
    - session_number_today: 1st open = best content; 5th open = deeper inventory
    - network_quality: on slow network, deprioritize video

Model: Multi-task learning — predict P(like), P(comment), P(save), P(share), P(dwell_time > 3s)
  Final score = w1*P(like) + w2*P(comment) + w3*P(save) + w4*P(share) + w5*P(dwell)
  Weights: save and share are weighted highest (stronger engagement signals than likes)

Serving: candidate generation (500 posts from fan-out cache) → ML ranking → top 50 served
Latency budget: < 100 ms for scoring 500 candidates (batched inference, ONNX on CPU)
Diversity injection: after ranking, ensure no 3 consecutive posts from same author

Race Condition: Post Visible Before Media Processed

Problem: User creates post → metadata saved to DB → fan-out starts → followers see post
  But media is still processing (resize, filter, moderation). Followers see broken image.

Solution: Two-phase publishing
  Phase 1: Upload media + process (resize, moderate). Post status = "processing"
  Phase 2: Only after ALL media variants ready → set status = "published" → start fan-out

  Client shows spinner until post is "published" (typically 3-8 seconds).
  Fan-out never triggers for "processing" posts → followers never see broken images.

Interview Walkthrough

Separate media upload (async, multi-resolution transcoding) from feed fan-out — never fan-out until all variants are ready.
Use object storage (S3) for originals and CDN for delivery; discuss thumbnail, feed, and full-resolution pipelines.
Apply the same hybrid fan-out strategy as Twitter: push for normal accounts, pull-merge for celebrity/influencer posts.
Store the social graph in a graph DB or adjacency lists sharded by user_id; feed generation reads follower lists from cache.
Cover Stories separately: ephemeral TTL in Redis, separate CDN paths, no fan-out to main feed.
Quantify storage with Back-of-the-Envelope Estimation: 100M photos/day × 2 MB average = 200 TB/day ingest before compression tiers.
Common pitfall: triggering fan-out before transcoding completes, showing broken or single-resolution images to millions of followers.

Fan-Out on Write vs Fan-Out on Read for Feed Generation

Instagram's core design decision: how to deliver posts to followers' feeds?

Fan-Out on Write (push model):
  When Alice posts → immediately write Alice's post_id to every follower's feed cache

  ✓ Feed reads are O(1): just read pre-populated feed list
  ✓ Consistent feed across devices (pre-computed, same order)
  ✗ Write amplification: Alice has 100M followers → 100M Redis writes per post
  ✗ Celebrities cause thundering herd on write path
  ✗ Wasted work: most followers won't see the post in their next session

Fan-Out on Read (pull model):
  When Bob opens his feed → fetch latest posts from all 500 users he follows

  ✓ No write amplification
  ✓ Always fresh (no stale cached feed)
  ✗ Read fan-out: 500 followees × DB query per = 500 queries per feed load
  ✗ Latency: N queries serialized or N parallel with fan-out overhead

Instagram's Hybrid Approach ⭐:
  Regular users (< 50K followers): Fan-out on WRITE
    Post → async worker → write post_id to Redis feed:{follower_id} for each follower
    Each follower's feed is a sorted set of post_ids
    Cost: manageable for typical users (200-1000 followers)

  Celebrities (> 50K followers): Fan-out on READ
    Don't pre-compute their posts in follower feeds
    On feed request: fetch 500 posts from each followed celebrity (pull)

  Feed assembly on read:
    pre_computed_feed = Redis: feed:{user_id} (from regular users)
    celebrity_posts = merge(query latest 50 posts from each followed celeb)
    final_feed = merge_sorted(pre_computed_feed, celebrity_posts)
              → ML rank → return top 50

  This is called the "Hybrid Fan-out" and is used by Instagram, Twitter, and Facebook.

Photo Storage Optimization: CDN Tiering and Image Compression

Instagram serves 100B+ photos/day across multiple resolutions.
Storage and CDN bandwidth are the largest cost drivers.

Resolution variants stored per photo:
  thumbnail: 150×150 (for feed grid, 2-3 KB)
  low_res: 480×480 (feed display on mobile, 20-40 KB)
  standard: 1080×1080 (full view, 100-200 KB)
  original: as uploaded (up to 10 MB, never served directly)

Storage tiers:
  Hot (CDN edge): thumbnails + low_res for recent posts (last 30 days)
    → ~80% of serves come from recent content
    Cost: higher per GB but zero origin load

  Warm (S3 Standard): standard + all sizes for 30-180 day old posts
    → Served via CDN on demand, cached on first access

  Cold (S3 Glacier): original files + all sizes for > 180 days old
    → Retrieved on-demand with 1-12 hour delay (acceptable for old content)
    → 70% cheaper than S3 Standard per GB

Compression strategy:
  JPEG → WebP: 30-40% smaller at same visual quality
  Lossy compression: quality=85 for thumbnails, quality=92 for standard
  Progressive JPEG: browser shows low-quality preview while downloading
  Result: 40% reduction in CDN egress cost

CDN cache hit rate: ~95% for thumbnails, ~80% for standard resolution
  (recently uploaded photos by celebrities get >1B serves from CDN, zero S3 cost)

Explore Page: Content Discovery via Graph Signals

Instagram Explore shows content from people you DON'T follow.
Goal: maximize engagement while introducing users to new creators.

Signal types:

  1. Interest graph (most important):
     Cluster users by engagement patterns
     "Users like Alice (who likes dog photos) also engage with these accounts"
     Graph-based clustering → interest communities

  2. Content understanding:
     Computer vision: classify photo content (food, fashion, travel, etc.)
     Match user's historically engaged categories

  3. Social graph proximity:
     Posts liked by people I follow → second-degree signal
     "12 people you follow saved this post" → strong interest signal

  4. Trending:
     Posts with unusually high engagement velocity in past 1 hour
     ZADD trending:explore post_id {velocity_score}

Candidate generation → ranking pipeline:
  1. Retrieve ~10K candidates from interest graph + social graph + trending
  2. ML ranking model (same as feed): predict P(like), P(save), P(follow)
  3. Diversity enforcement: max 1 post per creator in top 50
  4. Safety filter: verified brand-safe content only for non-personalized positions
  5. Novelty boost: content user hasn't seen before gets a slight rank boost

Key difference from Feed:
  Feed: optimize for engagement with KNOWN creators (satisfaction)
  Explore: optimize for engagement + creator DISCOVERY (growth)
  Explore uses slightly higher novelty weight to surface new creators

SLOs & Error Budgets

Metric	Target	Rationale
Core user-facing availability	99.95%	Budget for planned maintenance + unplanned failures without user-visible outage.
p99 latency (critical path)	Problem-specific — state target early and tie to capacity math	Interview credibility comes from connecting SLO to architecture choices.
Error rate (5xx)	< 0.1%	Distinguishes transient blips from systemic failure requiring rollback.
Data durability	99.999999999% (11 nines) for committed writes	Define which operations require fsync/quorum vs async replication.

Incident Scenarios (2am reality)

Scenario	How you detect	Mitigation
Primary database unavailable	Health check failures, connection pool exhaustion alerts, elevated 5xx	Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists
Traffic spike (10× normal)	RPS anomaly alert, autoscaling lag, latency SLO burn rate	Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations
Bad deploy causing elevated errors	Canary metric regression, error budget burn, deployment correlation	Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility

Cost Drivers (Staff lens)

Egress bandwidth and CDN (often dominates media/data-heavy systems)
Database storage + IOPS at scale (plan compaction, TTL, tiering)
Compute for async pipelines (right-size workers, spot instances for batch)
Managed service premiums vs operational headcount trade-off

Multi-Region & DR

Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.