This problem appears in multiple sheets. Depth expectations increase as you progress:
| Track | What to demonstrate |
|---|---|
| Arch 75 | Staff level: multi-region, cost at scale, migration path, and production metrics. |
Interview Prompt
Design Design Live Likes & Reactions.
Clarifying Questions (ask before designing)
| Question | Why it matters |
|---|---|
| Which of these is highest priority: High-throughput write aggregation, Buffered counters, WebSocket broadcast? | Forces scope negotiation — senior candidates trim before drawing boxes. |
| What scale should we design for — DAU, QPS, data volume? | Drives every capacity decision; shows structured thinking. |
| What are the read vs write patterns on the critical path? | Determines caching, DB choice, and replication topology. |
| What consistency and durability guarantees are required? | Separates strong-consistency paths from eventual ones — a senior differentiator. |
Scope
In scope
- High-throughput write aggregation
- Buffered counters
- WebSocket broadcast
- Animation sync
- Capacity estimation with shown math
Out of scope (state explicitly)
- Full ML ranking model training pipeline
- Direct messaging / chat (#07)
- Ad insertion and monetization
Assumptions
- Clarify scale (DAU, QPS, data volume) for live likes reactions in the first 5 minutes
- Standard reliability target 99.9%–99.99% unless problem implies higher (payments, booking)
- Managed cloud services (RDS, S3, Kafka, Redis) are acceptable building blocks
These foundational concepts underpin the patterns used in this problem. Review them before deep-diving into component-level trade-offs.
- React to content: Allow users to emit live reactions (❤️ 😂 😮 😢 😡) to posts, videos, or active live streams.
- Real-time count: Display reaction counts that update instantly for all active viewers.
- Reaction animations: Render floating reaction emojis on live streaming screens in real-time.
- Toggle reactions: Allow users to change or withdraw their reaction instantly.
- Aggregate totals: Show the cumulative total counts grouped per reaction type (e.g., 1.2K ❤️, 300 😂).
- Deduplication: Restrict every user to exactly one reaction per content item.
- Ultra-Low Latency: Broadcast reactions to all viewers globally within 1 second.
- Extreme Scale & Throughput: Handle 1M+ reactions per second during peak viral streams (e.g., Super Bowl, live concerts).
- Eventual Consistency: Display aggregate counts can lag by a few seconds; approximate counters are perfectly acceptable.
- High Availability: 99.99% availability since reactions are core engagement indicators.
- Idempotency: Ensure accidental double clicks or retry requests do not duplicate reaction counts.
| Metric | Calculation | Value |
|---|---|---|
| Peak Reactions / sec (viral live event) | From Peak Reactions / day ÷ 86400 (+ peak factor in value) | 1,000,000 |
| Avg Reactions / sec | From Avg Reactions / day ÷ 86400 (+ peak factor in value) | 50,000 |
| Reaction Record Size | Derived | 50 bytes (user_id + content_id + type) |
| Write Throughput (Peak) | Given (peak load assumption) | 50 MB/s |
| Active Live Events | Derived | 10,000 |
| Reactions per Event (Avg) | Given (typical workload assumption) | 100,000 |
I/O and Bandwidth Calculations: - Ingestion Bandwidth (Peak): 1,000,000 reactions/sec × 50 bytes = 50 MB/s incoming bandwidth. - Daily Aggregate Counts: - 10,000 active events × 100,000 reactions = 1 Billion reactions recorded per day. - 1 Billion × 50 bytes = 50 GB storage space needed per day (raw persistence).
Reactions are submitted via a fast POST path to stateless Reaction Services which validate and record events in memory. Simultaneously, events stream into a Kafka partition pipeline. Reaction Aggregators batch and publish updates to Redis Pub/Subevery 500ms. Stateful WebSocket Gateways receive updates and broadcast batched delta animations to all viewers.
1. The Hot Counter Aggregation Path ⭐
Under viral conditions, issuing direct database updates (e.g., UPDATE counters SET count = count + 1) will lock rows and immediately crash the database. We implement a hybrid, tiered storage solution:
Layer 1 — Redis (real-time, approximate):
HINCRBY reaction_count:{content_id} heart 1
HINCRBY reaction_count:{content_id} laugh 1
Reads: HGETALL reaction_count:{content_id} → {heart: 15234, laugh: 3421}
- Redis handles 1M+ INCR/sec easily (single-threaded atomic operations in memory).
Layer 2 — Cassandra (persistent, exact):
Kafka consumer batches reactions every 5 seconds.
Batch write to Cassandra:
INSERT INTO reactions (content_id, user_id, type, timestamp) ...
Update aggregate counts:
UPDATE reaction_counts SET count = count + batch_size WHERE content_id = ?;
5-second eventual consistency is completely invisible to users.2. Deduplication Mechanisms
To maintain the "one reaction per user per content" rule, the system checks and dedups writes at two distinct levels:
Fast dedup check in Redis:
SADD reacted:{content_id} {user_id}
If SADD returns 0 → already reacted → update type (toggle) instead of increment.
Persistent dedup in Cassandra:
PRIMARY KEY (content_id, user_id) → natural dedup on upsert.3. Real-Time Broadcast & Batching
Broadcasting 1M individual messages/sec to WebSocket connections will overload client browsers and network pipes. We batch updates using delta frames:
For live streams with millions of viewers, don't broadcast individual reactions:
- Every 500ms, aggregate reactions received in that window:
{ "heart": +342, "laugh": +89, "wow": +23 }
- Broadcast ONE message with deltas to all WebSocket clients.
- Client-side: Smoothly animate counters incrementing by the delta, and trigger a random sample of 5 floating emojis (rather than rendering all 342).Event Bus Design (Kafka)
Topic: live_likes_reactions-events Partitions: 64 (scale consumers horizontally) Partition key: entity_id (user_id / order_id — preserves per-entity ordering) Retention: 7 days (compliance) or 24h (high-volume telemetry) Replication factor: 3, min.insync.replicas: 2 Producer: idempotent producer enabled (enable.idempotence=true) Consumer: consumer group "live_likes_reactions-processors" - At-least-once delivery + idempotent handlers (dedup by event_id) - DLQ topic: live_likes_reactions-events-dlq (poison messages after 3 retries) - Lag alert: consumer lag > 60s → scale workers Design Live Likes & Reactions: async side effects MUST NOT block the synchronous API response. Sync path: validate → persist source of truth → publish event → return 201 Async path: consumers update caches, indexes, notifications, aggregates
1. React to Content
POST /api/v1/reactions
{
"content_id": "post-123",
"type": "heart"
}
→ Response: 200 OK
{
"current_counts": { "heart": 15235, "laugh": 3421 }
}2. Remove Reaction
DELETE /api/v1/reactions?content_id=post-123
→ Response: 200 OK3. Retrieve Reaction Counts
GET /api/v1/reactions/counts?content_id=post-123
→ Response: 200 OK
{
"heart": 15234,
"laugh": 3421,
"wow": 892,
"total": 19547
}4. WebSocket Live Updates
// Server pushes deltas over WebSocket every 500ms:
{
"type": "reaction_update",
"content_id": "live-456",
"deltas": { "heart": 342, "laugh": 89 },
"total": 1523400
}Common Error Responses
400 Bad Request: invalid input, missing fields, or malformed JSON 401 Unauthorized: missing or invalid auth token or API key 403 Forbidden: authenticated but insufficient permissions 404 Not Found: resource ID does not exist 409 Conflict: duplicate write or version conflict; retry with idempotency key 422 Unprocessable Entity: valid syntax but invalid business logic 429 Too Many Requests: rate limit exceeded; honor Retry-After header 500 Internal Error: unexpected server fault; retry with idempotency key 503 Service Unavailable: dependency down or overloaded; use exponential backoff
Redis Structure
# Real-time reaction counts stored in a Hash structure
Key: reaction_count:{content_id}
Type: Hash
Fields: { "heart": 15234, "laugh": 3421, "angry": 88 }
# Dedup tracking set
Key: reacted:{content_id}
Type: Set (or Bloom Filter)
TTL: 24 hours for live streamsCassandra Schema Design
-- Individual user reactions for persistence and dedup
CREATE TABLE user_reactions (
content_id UUID,
user_id UUID,
type TEXT, -- heart, laugh, wow, sad, angry
created_at TIMESTAMP,
PRIMARY KEY (content_id, user_id)
);
-- Persisted aggregate reaction counts
CREATE TABLE reaction_counts (
content_id UUID PRIMARY KEY,
heart COUNTER,
laugh COUNTER,
wow COUNTER,
sad COUNTER,
angry COUNTER,
total COUNTER
);| Concern Scenario | Mitigation Strategy |
|---|---|
| Redis Cluster Node Outage | Redis Cluster employs primary-standby failover. If memory is lost, counts can be completely reconstructed by executing aggregate count jobs on Cassandra. |
| Double Counting / Retries | Checked atomically using SADD in Redis. Cassandra inserts are fully idempotent since primary keys naturally overwrite duplicate rows. |
| Kafka Consumer Backlog | Persistent database records may experience brief lag, but real-time client traffic continues uninterrupted via Redis memory. |
1. Toggle Reaction Race Conditions ⭐
If a user rapidly taps Like, Unlike, and Like again, out-of-order packet delivery could corrupt the count. We leverage Redis set idempotency to defend the state:
User rapidly clicks: Like → Unlike → Like (within 100ms)
Expected arrival order:
T=0ms: LIKE received → SADD reacted:{cid} user-1 → returns 1 → HINCRBY heart +1
T=30ms: UNLIKE received → SREM reacted:{cid} user-1 → returns 1 → HINCRBY heart -1
T=60ms: LIKE received → SADD reacted:{cid} user-1 → returns 1 → HINCRBY heart +1
Result: heart count is +1 (correct)
What if network delays reverse arrival order?
T=0ms: UNLIKE arrives first → SREM user-1 → returns 0 (not in set) → no decrement
T=30ms: LIKE arrives → SADD user-1 → returns 1 → HINCRBY heart +1
T=60ms: LIKE (second) arrives → SADD user-1 → returns 0 (already in) → no increment
Result: heart count is +1. Redis SET structures resolve idempotency naturally.2. Viral Hot Keys (1M+ events on a single stream) ⭐
When 1M users react to a single stream, all traffic hits a single Redis node partition. We prevent node saturation:
Problem: content_id "super-bowl-2026" → 1M reactions/sec hitting ONE Redis node.
Solutions:
1. In-Memory In-Process Aggregation (Reaction Service):
- Buffer reactions in service memory for 100ms.
- Send a single batch HINCRBY with cumulative counts (e.g. +342).
- Reduces 1M write commands to ~200 Redis ops/sec across 20 nodes.
2. Sharded Sub-Counters:
- Split single count key into N sharded keys:
reaction_count:{cid}:0, reaction_count:{cid}:1, ..., reaction_count:{cid}:7
- Query: SUM across all 8 sub-counters.
3. Local Redis Sidecars with Write-Behind:
- Accept writes into local host Redis instances.
- Background tasks sync accumulated deltas to the central cluster.3. Redis Deduplication Set Memory Exhaustion ⭐
Storing 50M user IDs in a Redis set consumes ~800MB per event, causing rapid cluster memory exhaustion. We solve this memory bottleneck:
Problem: SADD reacted:{content_id} stores all user_ids.
- 50M unique users × 16 bytes (UUID) = 800 MB memory for one post.
Solutions:
1. Bloom Filters ⭐:
- Use ~15 bits per element.
- 50M elements × 15 bits = 93 MB (88% memory reduction).
- Trade-off: ~1% false-positive rate (some clicks are dropped, which is fine for reactions).
2. TTL cleanup:
- Set TTL to 24h. Old historical events fall back to Cassandra checks.
3. Split-Tier checking:
- Use Redis dedup exclusively for active live streams.
- Static post reactions run queries directly against Cassandra.Count Display Formatting & User Perception
Since numbers above 1K are shortened (e.g. 15.2K, 1.5M), minor discrepancies between Redis counts and Cassandra persisted counts are completely invisible to users. This permits us to prioritize eventual consistency and in-memory aggregation over heavy transactional write architectures.
Interview Walkthrough
- State the throughput mismatch: 1M reactions/sec in during a viral stream, but viewers only need batched counter updates every 500ms.
- Propose tiered counters: Redis HINCRBY for real-time display, Kafka → Cassandra for durable persistence with 5-second batching.
- Enforce one-reaction-per-user via Redis SADD dedup; switch to Bloom filters when dedup sets exceed memory on mega-streams.
- Broadcast delta frames (
{heart: +342, laugh: +89}) over WebSocket, not individual reaction events to every viewer. - Shard hot counter keys across N sub-counters when a single Redis node becomes the bottleneck for one content ID.
- Accept eventual consistency — formatted counts (15.2K) hide small discrepancies between Redis and Cassandra.
- Handle toggle races with set-based idempotency: SADD/SREM semantics resolve out-of-order Like/Unlike packets naturally.
- Common pitfall: fanning out every individual reaction emoji to millions of WebSocket clients — the broadcast layer collapses first.
1. Write Path Tiering (Speed vs Durability)
Writing every transaction synchronously to disk at 1M/sec is structurally impossible. We trade off absolute synchronous durability for highly responsive in-memory writes. If a collector crash occurs, up to 5 seconds of transient counts might be lost, but the system stays fast and operational.
2. Counter Implementation Architectures
| Approach | Latency | Accuracy | Durability | Best For |
|---|---|---|---|---|
| Redis HINCRBY ⭐ | < 1 ms | Exact in Redis memory | Volatile (syncs periodically) | Real-time viewer display updates |
| Cassandra COUNTER | ~5 ms | Exact | Highly durable | Persistent aggregate history |
| Flink window aggregation | 1-5 sec window | Exact within window | Highly durable output | Complex analytics (region, time) |
| HyperLogLog (HLL) | < 1 ms | ~0.81% error rate | Volatile | Unique reactor count (approximate) |
Staff interviews expect you to articulate how the system evolves under real growth — not jump straight to the final architecture.
Phase 1: MVP (0 to 100K users)
Monolith or minimal services proving core live likes reactions flows. Optimize for shipping speed and correctness over scale.
Key components: Single region · Primary DB + Redis cache · Synchronous core path · Basic monitoring
Move to next phase when: p99 latency exceeds SLO or DB CPU sustained above 70%
Phase 2: Growth (100K to 10M users)
Split read/write paths, introduce async processing for non-critical work, add caching layers and horizontal scaling.
Key components: Read replicas or CQRS · Message queue for async work · CDN / edge caching · Service-level SLOs
Move to next phase when: Hot keys, fan-out bottlenecks, or ops toil from manual scaling
Phase 3: Scale (10M+ users)
Shard data plane, multi-region active-active or active-passive, formal DR runbooks, cost optimization.
Key components: Database sharding / partitioning · Multi-region replication · Auto-scaling + chaos testing · Dedicated platform/SRE ownership
Move to next phase when: Regional failure domain risk, compliance data residency, or linear cost growth unsustainable
SLOs & Error Budgets
| Metric | Target | Rationale |
|---|---|---|
| Core user-facing availability | 99.95% | Budget for planned maintenance + unplanned failures without user-visible outage. |
| p99 latency (critical path) | Problem-specific — state target early and tie to capacity math | Interview credibility comes from connecting SLO to architecture choices. |
| Error rate (5xx) | < 0.1% | Distinguishes transient blips from systemic failure requiring rollback. |
| Data durability | 99.999999999% (11 nines) for committed writes | Define which operations require fsync/quorum vs async replication. |
Incident Scenarios (2am reality)
| Scenario | How you detect | Mitigation |
|---|---|---|
| Primary database unavailable | Health check failures, connection pool exhaustion alerts, elevated 5xx | Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists |
| Traffic spike (10× normal) | RPS anomaly alert, autoscaling lag, latency SLO burn rate | Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations |
| Bad deploy causing elevated errors | Canary metric regression, error budget burn, deployment correlation | Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility |
Cost Drivers (Staff lens)
- Egress bandwidth and CDN (often dominates media/data-heavy systems)
- Database storage + IOPS at scale (plan compaction, TTL, tiering)
- Compute for async pipelines (right-size workers, spot instances for batch)
- Managed service premiums vs operational headcount trade-off
Multi-Region & DR
Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.