This problem appears in multiple sheets. Depth expectations increase as you progress:
| Track | What to demonstrate |
|---|---|
| Arch 25 | Draw the dual-table Cassandra model (following + followers). Explain is-following hot path, celebrity bucket sharding, and fan-out threshold (100K). Show follow write sequence with idempotency. |
| Arch 50 | Count reconciliation, block list interaction, Kafka fan-out lag SLOs, and Redis failure fallback to Cassandra. Quantify celebrity fan-out cost (100M × 64B). |
| Arch 75 | Staff: TAO pattern vs graph DB for multi-hop queries, follow spam bot detection, and cross-region follow consistency (celebrity in US, follower in EU). |
Interview Prompt
Design a follower/following system like Twitter. Users can follow/unfollow, view follower and following lists, see counts on profiles, and get follow suggestions. The system must handle celebrities with 100M+ followers.
Clarifying Questions (ask before designing)
| Question | Why it matters |
|---|---|
| Is the graph directed or bidirectional? | Follow is directed (A follows B ≠ B follows A). Affects storage model and suggestion algorithms. |
| What's the fan-out model for feed delivery? | Fan-out-on-write vs read determines whether follow graph is on critical path for posts. |
| How fresh must follower counts be? | Eventual consistency (seconds lag) is acceptable for counts; follow action must be immediately visible to actor. |
| Do we need multi-hop queries (friends-of-friends)? | 1-hop is O(1) with lookup table; 2-hop requires graph traversal or pre-computation. |
Scope
In scope
- Follow/unfollow with block list
- Follower/following lists with pagination
- Follower/following counts
- Is-following check at 500K QPS
- Celebrity scaling (100M followers)
- Follow suggestions (friends-of-friends)
Out of scope (state explicitly)
- Full news feed design (reference fan-out integration only)
- Direct messaging
- Account verification / blue check
Assumptions
- 1B users, 200 avg followers, 200M follows/day
- Follow/unfollow p99 < 100ms
- Celebrity threshold: 100K followers → fan-out-on-read
These foundational concepts underpin the patterns used in this problem. Review them before deep-diving into component-level trade-offs.
- Follow/Unfollow: Directed edge between users
- Follower list: Paginated, sorted by recency
- Following list: Paginated list
- Follower/following count: Display on profile
- Follow suggestions: "People you may know"
- Follow status check: Fast "does A follow B?" lookup
- Fan-out on write: Deliver posts to followers' feeds
- Notifications: "Alice started following you"
- Low Latency: Follow/unfollow < 100 ms
- Scale: Handle 100M+ followers (celebrities)
- Consistency: Follow action immediately visible to actor
- Eventual Consistency: Count can lag by seconds
- Availability: 99.99%
- Hot spot handling: 100K follows/sec on one user
| Metric | Calculation | Value |
|---|---|---|
| Total users | Given | 1B |
| Avg followers per user | Given (power-law: most users < 500, celebrities skew avg) | 200 |
| Total follow edges | 1B users × 200 avg followers | 200B |
| Follow events / day | Given (~0.2 follows/user/day) | 200M |
| Follow events / sec | 200M/day ÷ 86400 ≈ 2.3K (peak 50K = 20×) | ~2.3K (peak 50K) |
| Follow check queries / sec | Given (is-following on every profile/timeline render) | 500K |
| Edge record size | follower_id (8B) + followee_id (8B) + timestamp (8B) | 24 bytes |
| Edge storage | 200B edges × 24 bytes | ~4.8 TB |
| Celebrity fan-out writes/post | 100M followers × 64B feed entry | ~6.4 GB per celebrity post (fan-out-on-read above 100K threshold) |
Write Path: Follow Action
Alice follows Bob: validate (not blocked, not self, not already following) → Cassandra BATCH INSERT into following + followers tables → Redis ZADD following/followers sorted sets + INCR counts → Kafka follow-events for feed fan-out and notifications. Target: < 100ms p99.
Follow write sequence:
1. Check block list: GET block:{alice}:{bob} — 403 if blocked
2. Idempotency: SET follow:idem:{key} NX EX 86400 — return cached if exists
3. Cassandra BATCH:
INSERT INTO following (alice, now, bob)
INSERT INTO followers (bob, bucket(alice), alice, now)
INSERT INTO is_following (alice, bob)
4. Redis pipeline: ZADD following:{alice}, ZADD followers:{bob} (if < 100K)
INCR follower_count:{bob}, INCR following_count:{alice}
5. Kafka publish: { event: "follow", actor, target, ts }
6. Return 201Is-Following Lookup: 500K QPS Hot Path
Every profile view, timeline render, and DM permission check calls this. Three-tier lookup:
- Redis:
is_following:{a}:{b}: 1ms, 95% hit rate - Cassandra is_following table: O(1) partition key lookup: 5ms
- Negative cache:
is_following:{a}:{b} = "0"with 5min TTL to avoid repeated misses
Pipelined batch lookups: check 50 friends in one Redis MGET round trip when rendering mutual friends.
Celebrity Hot Partition: 100M Followers
Bucketed partitioning: PRIMARY KEY ((user_id, bucket), follower_id) with bucket = hash(follower_id) % 100. 100 sub-partitions × 1M rows each = manageable compaction. Count stored separately via Redis INCR/DECR (never SELECT COUNT(*)). Do NOT cache celebrity follower lists in Redis: list is 100M entries; paginate from Cassandra only.
Fan-out threshold: Users with > 100K followers switch to fan-out-on-read. A celebrity post would require 100M Redis LPUSH ops (~6.4 GB) if fan-out-on-write: unacceptable. Hybrid model matches Twitter/Instagram production architecture.
Unfollow, Block, and Consistency
Unfollow: DELETE from both tables + Redis ZREM + DECR counts. Eventual consistency on counts (± few seconds) is acceptable per NFR.
Block: INSERT into blocks table + remove existing follow edge if present + purge from both follower lists. Block check runs before every follow attempt.
Count reconciliation: Hourly Spark job counts edges per user, compares to Redis counters, alerts on drift > 1%. Fixes via admin tool, not automatic overwrite (prevents masking bugs).
Follow Suggestions: Friends-of-Friends
Mutual friends algorithm: for each user Alice follows, get their following lists → aggregate → rank by frequency → filter already-followed and blocked. Offline Spark job daily → store top 50 in Redis suggestions:{uid}. Online refresh on new follow (incremental update for 1-hop only).
For celebrities: intersect viewer's following list (~500) with celebrity's followers using pipelined is_following checks: never materialize full celebrity follower set.
POST /api/v1/users/{target_uid}/follow
Authorization: Bearer <token>
Idempotency-Key: follow-{actor}-{target}-{timestamp}
Response: 201 Created
{ "following": true, "followee_id": "uuid", "created_at": "2026-03-14T10:00:00Z" }
Errors:
400 Bad Request — cannot follow yourself
401 Unauthorized — invalid/missing token
403 Forbidden — blocked by target user or rate limited
404 Not Found — target user does not exist
409 Conflict — already following (idempotent: returns 200 with same body)
429 Too Many Requests — follow spam (>100 follows/hour)
DELETE /api/v1/users/{target_uid}/follow
Response: 200 OK { "following": false }
GET /api/v1/users/{uid}/followers?cursor={last_id}&limit=50
Response: 200 OK
{ "followers": [{ "id", "name", "avatar" }], "total": 15234, "next_cursor": "..." }
GET /api/v1/users/{uid}/is-following/{target_uid}
Response: 200 OK { "following": true } // p99 < 2ms via Redis/lookup tableCommon Error Responses
400 Bad Request: invalid input, missing fields, or malformed JSON 401 Unauthorized: missing or invalid auth token or API key 403 Forbidden: authenticated but insufficient permissions 404 Not Found: resource ID does not exist 409 Conflict: duplicate write or version conflict; retry with idempotency key 422 Unprocessable Entity: valid syntax but invalid business logic 429 Too Many Requests: rate limit exceeded; honor Retry-After header 500 Internal Error: unexpected server fault; retry with idempotency key 503 Service Unavailable: dependency down or overloaded; use exponential backoff
Cassandra: Source of Truth
CREATE TABLE following (
user_id UUID,
follows UUID,
created_at TIMESTAMP,
PRIMARY KEY (user_id, created_at, follows)
) WITH CLUSTERING ORDER BY (created_at DESC);
CREATE TABLE followers (
user_id UUID,
bucket INT,
follower UUID,
created_at TIMESTAMP,
PRIMARY KEY ((user_id, bucket), created_at, follower)
) WITH CLUSTERING ORDER BY (created_at DESC);
CREATE TABLE is_following (
follower_id UUID,
followee_id UUID,
PRIMARY KEY (follower_id, followee_id)
);Why Cassandra? Write-heavy workload (200M writes/day), bucketed partitioning for celebrities, Last-Writer-Wins conflict resolution, tunable consistency.
Redis: Cache Layer
following:{uid} → Sorted Set (score=timestamp)
followers:{uid} → Sorted Set (non-celebrity, < 100K)
is_following:{a}:{b} → "1" (TTL: 24h)
follower_count:{uid} → Integer
following_count:{uid} → Integer
suggestions:{uid} → List (pre-computed, TTL: 24h)| Concern | Solution |
|---|---|
| Dual table write failure | Cassandra BATCH is atomic per partition; retry on timeout; idempotency key prevents duplicate edges |
| Count drift | Hourly Spark reconciliation: count edges → compare Redis → alert + manual fix if drift > 1% |
| Celebrity hot partition | 100-way bucket sharding; fan-out-on-read above 100K followers; never cache full list |
| Follow spam / bot farms | Rate limit 100 follows/hour; CAPTCHA after 50; graph analysis for bot clusters |
| Redis failure | Fallback to Cassandra is_following table; rebuild cache async; counts may lag seconds |
| Kafka lag on follow events | Feed fan-out is async: acceptable 5s delay; notifications use separate priority topic |
| Split-brain on follow/unfollow race | Last-writer-wins on timestamp; unfollow always wins over concurrent follow retry |
Mutual Friends Computation
For normal users: Redis ZINTERSTORE on follower sets (< 10K). For celebrities: intersect the viewer's following list (small, ~500): check is_following:{friend}:taylor with pipelined lookups.
Feed Fan-Out Integration
Normal users (< 10K followers): fan-out on write (LPUSH to each follower's feed). Celebrities (> 100K): fan-out on read (pull latest posts at read time). Hybrid model used by Twitter/Instagram.
Interview Walkthrough
- Store edges as (follower_id, followee_id) pairs in Cassandra with indexes on both columns — follow and follower queries are both hot paths.
- Cache paginated follower/following lists in Redis sorted sets (score = timestamp) with cache-aside on read, invalidate on write.
- Maintain denormalized counts in a Redis hash (
follow_count:{uid}) updated atomically on follow/unfollow — never COUNT(*) on read. - Apply hybrid feed fan-out: write fan-out for users with <10K followers, read fan-out (pull) for celebrities above 100K.
- Compute mutual friends via Redis ZINTERSTORE for normal users; for celebrity profiles, intersect the viewer's small following list instead.
- Use the TAO pattern (Cassandra + Redis) rather than a graph database — no graph DB scales to 500B+ edges at Facebook/Twitter scale.
- Handle follow/unfollow races with Last-Writer-Wins timestamps so out-of-order packets don't leave stale follow state.
- Common pitfall: pure write fan-out when a celebrity with 100M followers posts — 100M inbox writes per post is structurally impossible.
Graph DB (Neo4j) vs Cassandra + Redis (TAO Pattern)
1-hop queries: Redis fastest (< 1ms). Multi-hop (friend-of-friend): Graph DB native but limited to ~10B edges. Write throughput: Cassandra excels. Scale: Cassandra + Redis scales to 500B+ edges. Recommendation: TAO pattern (Cassandra + Redis) for social networks at scale.
Feed Fan-Out: Write vs Read
Write fan-out: immediate delivery, cost O(followers) per post. Read fan-out: pull at read time, cost O(celebrities followed) per read. Hybrid with threshold-based split is the proven approach at Twitter/Instagram scale.
Staff interviews expect you to articulate how the system evolves under real growth — not jump straight to the final architecture.
Phase 1 — MVP (single Postgres)
Single followers table with indexes. Redis cache for is-following. Fan-out-on-write for all users.
Key components: PostgreSQL · Redis · Sync fan-out
Move to next phase when: Hash hot partitions or celebrity fan-out exceeds Redis write budget
Phase 2 — Cassandra + hybrid fan-out
Dual Cassandra tables, bucketed celebrity partitions, fan-out-on-read above 100K followers.
Key components: Cassandra · Redis counts · Kafka follow-events · Hybrid fan-out
Move to next phase when: 500M+ users; count drift requires reconciliation pipeline
Phase 3 — TAO at billion scale
Dedicated is_following service, geo-replicated Cassandra, offline graph ML for suggestions, bot detection.
Key components: Multi-region Cassandra · Spark reconciliation · Graph ML · Bot detection
Move to next phase when: Cross-region follow latency > 100ms p99; follow spam attacks
SLOs & Error Budgets
| Metric | Target | Rationale |
|---|---|---|
| Follow/unfollow p99 latency | < 100ms | User-facing write — must feel instant to actor |
| Is-following p99 | < 2ms | Called on every profile/timeline render at 500K QPS |
| Count drift after reconciliation | < 1% | Eventual consistency acceptable but bounded |
Incident Scenarios (2am reality)
| Scenario | How you detect | Mitigation |
|---|---|---|
| Celebrity follow causes hot partition timeout | Cassandra write timeout rate spikes for single user_id partition | Verify bucket distribution; rate-limit follows to celebrity; enable fan-out-on-read for that user |
| Follower count shows 0 for active user | User reports vs reconciliation job drift > 1% | Check Redis INCR/DECR logs; run reconciliation; find root cause before blind overwrite |
Cost Drivers (Staff lens)
- Cassandra storage: 200B edges × 24B = 4.8 TB
- Redis: is-following cache ~50GB at 95% hit rate
- Kafka follow-events: 200M/day × 200B = 40 GB/day
Multi-Region & DR
Follow edges stored in home region of followee. Cross-region follow adds ~50ms. Celebrity reads use local read replica. Counts are eventually consistent across regions (CRDT or periodic sync).