This problem appears in multiple sheets. Depth expectations increase as you progress:
| Track | What to demonstrate |
|---|---|
| Arch 25 | The connection-state problem. Nail WebSocket scaling (sticky sessions vs pub/sub), message ordering per conversation, group fan-out math, and offline delivery without blocking the hot path. |
| Arch 50 | Add read receipts, presence, typing indicators, and media upload flow. Discuss when to use CRDTs vs server-assigned sequence numbers. |
| Arch 75 | Staff: E2E encryption key exchange (Signal protocol sketch), multi-device sync, and what breaks at 500M concurrent connections. |
Interview Prompt
Design a real-time chat application like WhatsApp or Slack. Users send 1:1 and group messages with delivery in under 1 second. Support offline message delivery, read receipts, and optional end-to-end encryption.
Clarifying Questions (ask before designing)
| Question | Why it matters |
|---|---|
| 1:1 only or groups — and what's the max group size? | Group fan-out is O(members) per message. 256-member Slack channel vs 50-person WhatsApp group drives completely different push architecture. |
| Do we need message ordering guarantees within a conversation? | Per-conversation total order requires a single sequence authority; cross-device sync adds complexity. |
| Online-only delivery or persistent offline queue? | Offline queue means durable write before ACK — changes write path from fire-and-forget to at-least-once with idempotency. |
| E2E encryption required or server-readable messages? | E2E means server stores ciphertext only — search, moderation, and read receipts become client-side problems. |
Scope
In scope
- 1:1 and group messaging with sub-second delivery
- WebSocket connection management at scale
- Message persistence and offline delivery
- Read receipts and delivery status
- High-level E2E encryption architecture
Out of scope (state explicitly)
- Voice/video calling (WebRTC is a separate design)
- Full Signal protocol implementation details
- Content moderation ML pipeline
- Billing and enterprise SSO
Assumptions
- 50M DAU, 500M messages/day (~5.8K msg/sec avg, 50K peak)
- Average group size 8, max 256 members
- Messages retained 1 year; media in object storage
- 99.9% message delivery within 5 seconds when recipient online
These foundational concepts underpin the patterns used in this problem. Review them before deep-diving into component-level trade-offs.
- 1:1 messaging: Send and receive text messages between two users in real-time
- Group messaging: Support group chats with up to 256 members (WhatsApp) or thousands (Slack channels)
- Online/Offline presence: Show whether a user is online, offline, or last seen
- Read receipts: Show sent ✓, delivered ✓✓, and read (blue ✓✓) status
- Media sharing: Send images, videos, documents, voice messages
- Message history: Persist messages and allow history retrieval with pagination
- Push notifications: Notify offline users of new messages
- End-to-end encryption (for WhatsApp-style): Messages encrypted on device, server cannot read content
- Ultra-Low Latency: Message delivered in < 100 ms between online users
- High Availability: 99.99% uptime: chat is a core communication tool
- Message Ordering: Messages within a conversation must appear in correct order
- Durability: Messages must never be lost (stored reliably)
- Scalability: Support 1B+ users, 100M+ concurrent connections
- Consistency: Messages must be delivered at least once (at-least-once, idempotent on client)
- Offline Support: Messages sent to offline users are delivered when they come back online
| Metric | Calculation | Value |
|---|---|---|
| Total users | Given (product assumption) | 2B |
| DAU | Given (product assumption) | 500M |
| Concurrent connections | Given (peak load assumption) | 100M |
| Messages / day | 500M DAU × 100 msgs | 50B (~100 msgs per user/day) |
| Messages / sec | 50B ÷ 86400 | ~580K |
| Avg message size | Given (typical workload assumption) | 200 bytes (text) |
| Storage / day | 50B × 200B | 10 TB |
| Storage / year | Given | ~3.6 PB |
| Media messages (10%) | 5B/day × 500KB avg | 2.5 PB/day |
WebSocket Connection & Chat Servers
- Why WebSocket: HTTP is request-response (half-duplex). WebSocket provides full-duplex, persistent connection: essential for real-time messaging
- How:
- Client opens WebSocket connection to a chat server (via load balancer)
- Connection is kept alive with heartbeats (every 30 seconds)
- Messages are sent/received over this connection in real-time
- If connection drops, client reconnects with exponential backoff
- Scaling: Each chat server handles ~100K concurrent WebSocket connections
- 100M concurrent users / 100K per server = 1,000 chat servers
- Sticky sessions: A user's WebSocket must remain on the same server for the duration of the connection. Use consistent hashing on
user_idat the L4 load balancer - Connection state: Each server maintains an in-memory map:
user_id → WebSocket connection
Session Service + Redis
- Purpose: Track which chat server each user is connected to
- Data:
session:{user_id}→{server_id, connected_at, last_active} - Why Redis: In-memory, sub-ms lookups, supports TTL for auto-cleanup
- Flow: When user connects, chat server registers in Redis. When user disconnects, entry is removed
Message Router (Cross-Server Communication)
- Problem: User A is on Chat Server 1, User B is on Chat Server 2. How does the message get from Server 1 to Server 2?
- Solution 1: Redis Pub/Sub:
- Each chat server subscribes to a Redis channel named after its server_id
- To route a message: look up recipient's server_id in Session Service → publish to that server's channel
- Pros: Simple, low latency
- Cons: Redis Pub/Sub is fire-and-forget (no persistence). If the subscriber is slow, messages can be lost
- Solution 2: Kafka (recommended for durability):
- Topic per chat server or per user-range
- Each chat server is a consumer for its assigned partitions
- Pros: Durable, replayable, handles backpressure
- Cons: Slightly higher latency (~5-10ms) vs Redis Pub/Sub (~1ms)
- Hybrid: Use Redis Pub/Sub for online → online delivery (fast path). Use Kafka as durable fallback for offline users and reliability
Message Service
- Purpose: Persist messages, retrieve chat history
- Write flow:
- Receive message from chat server
- Assign a monotonically increasing
message_idper conversation (using Snowflake or per-conversation counter) - Write to Cassandra
- Update conversation's
last_messagein cache
- Read flow: Paginated retrieval:
SELECT * FROM messages WHERE conversation_id = ? AND message_id < ? ORDER BY message_id DESC LIMIT 20
Cassandra (Message Store)
- Why Cassandra:
- Write-heavy workload (580K writes/sec)
- Natural time-series access pattern (messages by conversation, ordered by time)
- Horizontal scaling, multi-DC replication
- No complex joins needed
- Partition key:
conversation_id(all messages for a chat are co-located) - Clustering key:
message_id(ordered retrieval)
Presence / Online Status
- How:
- When user connects → set
presence:{user_id}= "online" in Redis (with TTL 60s) - Heartbeat every 30s resets the TTL
- If no heartbeat → key expires → user is "offline"
- Last seen =
last_activetimestamp from session entry
- When user connects → set
- Fan-out: When user's presence changes, notify their contacts who are online
- For users with many contacts, this is expensive → batch and throttle presence updates
- Only show presence for active conversations (Slack approach)
Group Service
- Manages: Group creation, membership, admin roles
- Group message delivery:
- Sender sends message to group
- Group Service fetches member list
- For each member: look up session → route message to their chat server
- Store one copy of the message in DB (not N copies)
- Optimization: For large groups (1000+ members), use Kafka topic per group. Members' chat servers subscribe to the group topic
Media Service
- Flow:
- Client uploads media to Media Service (HTTP, not WebSocket: WebSocket is for text)
- Media Service stores in S3, generates a CDN URL
- Client sends a message with the media URL (not the binary) over WebSocket
- Recipient fetches media from CDN
- Thumbnail generation: For images/videos, generate a low-res thumbnail (< 10 KB) stored inline in the message: displayed before full media is downloaded
- Size limits: Max file size ~100 MB. For large files, use multipart upload with resumable support
Push Notification Service
- When triggered: Chat Server detects recipient is OFFLINE (no session in Redis) → publishes to Kafka topic
push-notifications - Consumer picks up the event and sends via:
- APNs (Apple Push Notification Service) for iOS
- FCM (Firebase Cloud Messaging) for Android
- Web Push for browser clients
- Notification content: "Alice sent you a message" (NOT the full message content: for E2EE, server can't read it)
- Batching / collapsing: If multiple messages arrive while user is offline, collapse into "Alice sent you 5 messages" (not 5 separate push notifications)
- Badge count: Update unread badge count via silent push (iOS) or data message (Android)
- Rate limiting: Max 1 push per conversation per 30 seconds to avoid spamming
E2EE Key Service
- Public key directory: Each user registers their public key (identity key + signed pre-key + one-time pre-keys) with the server
- Key exchange (X3DH): When Alice wants to message Bob for the first time:
- Alice fetches Bob's pre-key bundle from Key Service
- Alice performs X3DH to derive a shared secret
- Alice encrypts the first message with this shared secret
- Bob receives the message + Alice's ephemeral public key → derives the same shared secret
- Pre-key rotation: One-time pre-keys are consumed on use. Client periodically uploads new batches (100 pre-keys at a time). If exhausted → fall back to signed pre-key (less forward secrecy)
- Double Ratchet: After initial key exchange, each message derives a new key: so compromise of one key doesn't expose other messages
End-to-End Encryption (E2EE)
- Signal Protocol (used by WhatsApp):
- Each user generates a key pair (public + private)
- Public key registered with the server
- Messages encrypted with recipient's public key on sender's device
- Server stores encrypted ciphertext: cannot read the content
- Key exchange uses X3DH (Extended Triple Diffie-Hellman)
- Forward secrecy via Double Ratchet algorithm
Send Message
WebSocket Frame:
{
"type": "send_message",
"message_id": "client-generated-uuid",
"conversation_id": "conv-uuid",
"content": "Hey, how are you?",
"content_type": "text",
"timestamp": 1710320000000
}
Server Acknowledgment:
{
"type": "ack",
"message_id": "client-generated-uuid",
"server_message_id": "snowflake-id",
"status": "sent",
"timestamp": 1710320000050
}Receive Message
WebSocket Frame (pushed to recipient):
{
"type": "new_message",
"message_id": "snowflake-id",
"conversation_id": "conv-uuid",
"sender_id": "user-123",
"content": "Hey, how are you?",
"content_type": "text",
"timestamp": 1710320000050
}Message Status Updates
WebSocket Frame:
{
"type": "message_status",
"message_id": "snowflake-id",
"status": "delivered" | "read",
"timestamp": 1710320001000
}REST APIs (for non-real-time operations)
GET /api/v1/conversations/{conv_id}/messages?before={message_id}&limit=50
GET /api/v1/conversations
POST /api/v1/groups
POST /api/v1/groups/{group_id}/membersCommon Error Responses
400 Bad Request: invalid input, missing fields, or malformed JSON 401 Unauthorized: missing or invalid auth token or API key 403 Forbidden: authenticated but insufficient permissions 404 Not Found: resource ID does not exist 409 Conflict: duplicate write or version conflict; retry with idempotency key 422 Unprocessable Entity: valid syntax but invalid business logic 429 Too Many Requests: rate limit exceeded; honor Retry-After header 500 Internal Error: unexpected server fault; retry with idempotency key 503 Service Unavailable: dependency down or overloaded; use exponential backoff 440 Login Timeout: WebSocket session expired; reconnect required
Cassandra: Messages
CREATE TABLE messages (
conversation_id UUID,
message_id BIGINT, -- Snowflake ID (time-ordered)
sender_id UUID,
content TEXT, -- encrypted ciphertext
content_type VARCHAR, -- text, image, video, voice
media_url TEXT,
created_at TIMESTAMP,
PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);Cassandra: Conversations per User
CREATE TABLE user_conversations (
user_id UUID,
last_message_at TIMESTAMP,
conversation_id UUID,
conversation_type VARCHAR, -- '1:1' or 'group'
last_message_preview TEXT,
unread_count INT,
PRIMARY KEY (user_id, last_message_at)
) WITH CLUSTERING ORDER BY (last_message_at DESC);Redis: Session & Presence
# Session (which server is user connected to)
Key: session:{user_id}
Value: Hash { server_id, connected_at, device_type }
TTL: 300 (5 min, refreshed by heartbeat)
# Presence
Key: presence:{user_id}
Value: "online"
TTL: 60 (refreshed by heartbeat every 30s)
# Last Seen (persisted separately)
Key: last_seen:{user_id}
Value: epoch_timestampMySQL: User & Group Metadata
CREATE TABLE users (
user_id UUID PRIMARY KEY,
phone VARCHAR(20) UNIQUE,
name VARCHAR(128),
avatar_url TEXT,
public_key BLOB, -- for E2EE
created_at TIMESTAMP
);CREATE TABLE groups (
group_id UUID PRIMARY KEY,
name VARCHAR(256),
avatar_url TEXT,
created_by UUID,
max_members INT DEFAULT 256,
created_at TIMESTAMP
);
CREATE TABLE group_members (
group_id UUID,
user_id UUID,
role ENUM('admin', 'member'),
joined_at TIMESTAMP,
PRIMARY KEY (group_id, user_id)
);Kafka Topics
Topic: messages (partitioned by conversation_id) Topic: presence-updates (partitioned by user_id) Topic: push-notifications
General
| Technique | Application |
|---|---|
| Message persistence before ack | Message written to Cassandra before server sends 'sent' ack to sender |
| Client-side retry | Client retries sending if no ack received within timeout |
| Idempotent writes | message_id generated by client → duplicate sends are idempotent |
| Multi-DC replication | Cassandra + Redis replicated across data centers |
Problem-Specific
1. Chat Server Crashes (User Loses WebSocket)
- Client detects connection loss, reconnects to another chat server
- New server registers new session in Redis
- Client sends "sync" request: "Give me all messages after message_id X"
- Server fetches from Cassandra and delivers missed messages
- No messages lost: All messages are persisted in Cassandra before ack
2. Message Ordering
- Scenario: Two rapid messages arrive at different servers out of order
- Solution:
- Messages have Snowflake IDs (time-ordered within sender)
- Client sorts by message_id before displaying
- Server stores with clustering by message_id → reads are ordered
3. Offline Message Delivery
- User B is offline when User A sends a message
- Message is persisted in Cassandra
unread_countincremented inuser_conversations- Push notification sent via APNs/FCM
- When User B comes online → sync: fetch all messages with
message_id > last_seen_message_id
4. Group Message Fan-Out
- 256-member group: message must be delivered to 256 users
- Solution: Don't fan out to all 256 as individual messages. Store once in DB. Each online member's chat server pulls/receives the message. Offline members sync on reconnect
5. Presence Thundering Herd
- User with 10,000 contacts comes online → must notify all 10K contacts
- Solution:
- Only notify contacts who have an active conversation open with this user
- Batch presence updates (send every 5 seconds, not instantly)
- For contacts not in the active view, fetch presence lazily when they open the chat
6. Split Brain (Network Partition Between DCs)
- Users in DC-A and DC-B both send messages to the same conversation
- Solution: Cassandra's eventual consistency + LWW (Last-Write-Wins) with Snowflake timestamps. Both messages are preserved (no conflict for chat: append-only model)
Message Sync Protocol
- Each client tracks
last_synced_message_idper conversation - On reconnect:
GET /messages?conversation_id=X&after=last_synced_message_id - For first-time device: full sync of recent N messages per conversation
Typing Indicators
- User A starts typing → send
{"type": "typing", "conversation_id": "...", "user_id": "..."}over WebSocket - Ephemeral: not persisted, only forwarded to online participants
- Auto-expires after 5 seconds of inactivity
Message Search
- Full-text search on messages using Elasticsearch
- Index: messages by
user_id+conversation_id+content - Note: With E2EE, server can't index encrypted content. Search must happen client-side (WhatsApp approach)
Multi-Device Support
- One user on phone + tablet + web simultaneously
- Session Service stores multiple sessions per user:
sessions:{user_id}→ SET of{server_id, device_id} - Messages delivered to ALL active devices
- Read receipts synced across devices
Rate Limiting
- Max 100 messages per minute per user (prevent spam bots)
- Max 256 members per group (WhatsApp) or configurable (Slack)
- Max file size: 100 MB
Data Retention
- Store messages for 90 days (WhatsApp doesn't store after delivery for E2EE)
- Or indefinitely (Slack enterprise) with archival to cold storage (S3 Glacier)
Interview Walkthrough
- Scope the chat modes first — 1:1, group, and channels have different fan-out and storage costs; pick one to depth, mention others.
- Draw the connection lifecycle: WebSocket handshake → session registry → heartbeat/reconnect — this is where scaling interviews focus.
- Guarantee per-conversation message ordering with a monotonic sequence ID; cross-conversation ordering is not required.
- Frame delivery guarantees using CAP Theorem and Consistency Models — online users get real-time push; offline users sync on reconnect.
- Persist messages through Kafka or a write-ahead log before acknowledging delivery, so crashes do not lose in-flight messages.
- For large groups, avoid naive broadcast — use a fan-out service or channel-based pub/sub rather than N individual WebSocket writes.
- Address multi-device sync: deliver to all active sessions and reconcile read receipts across devices.
- Common pitfall: storing every WebSocket on a single server without a connection registry and sticky routing strategy.
WebSocket vs Long Polling vs SSE: Why WebSocket?
| Protocol | How | Latency | Bi-directional | Connection Cost | Best For |
|---|---|---|---|---|---|
| WebSocket ⭐ | Persistent TCP connection, full-duplex | < 100ms | ✓ (both ways) | 1 TCP connection per client | Chat, gaming, collaborative editing |
| Long Polling | Client sends request, server holds until data available | 100-500ms | ✗ (simulated) | New HTTP request per message received | Fallback when WebSocket not available |
| SSE | Persistent HTTP connection, server to client only | < 100ms | ✗ (server to client only) | 1 HTTP connection per client | News feeds, notifications |
| HTTP Polling | Client polls every N seconds | N seconds | ✗ | New HTTP request every N seconds | Not suitable for chat |
Why WebSocket is ideal for chat:
- Bi-directional: Client sends messages AND receives messages on the same connection
- Low latency: No HTTP overhead per message (WebSocket frames are tiny: 2-14 bytes header vs 200+ bytes for HTTP headers)
- Efficient: 500M users × 1 WebSocket each = 500M connections. With HTTP polling every 1s, that's 500M HTTP requests/sec (30× the load)
When Long Polling is acceptable: As a WebSocket fallback behind corporate proxies that block WebSocket upgrades. Implement auto-detection: try WebSocket → if fails, fall back to Long Polling.
Why Cassandra for Messages (Not MySQL/PostgreSQL/MongoDB)?
Message access patterns:
1. Write a message (append-only, VERY frequent)
2. Read recent messages for a conversation (time-range query, VERY frequent)
3. Search messages (not on Cassandra — use Elasticsearch)
4. Never: JOINs, transactions, or complex queries on messages
Cassandra ✓:
- Partition key = conversation_id → all messages for a conversation on same node
- Clustering key = message_id (time-sorted) → efficient range reads
- LSM-tree storage → optimized for high write throughput (50K msgs/sec easily)
- TTL support → auto-expire messages after retention period
- Linear horizontal scaling → add nodes as message volume grows
- Multi-DC replication built-in → low latency reads globally
MySQL/PostgreSQL ✗:
- Single-writer primary → write bottleneck at scale
- Sharding required (by conversation_id) — Cassandra gives this natively
- No built-in TTL → need cron jobs for data cleanup
MongoDB:
- Could work (document model, sharding, TTL indexes)
- But: Cassandra's time-series data model (partition + clustering key) is a
more natural fit for "messages ordered by time within a conversation"
- MongoDB's WiredTiger engine handles writes well but Cassandra's LSM-tree
is better for append-heavy workloadsE2EE: Why Signal Protocol (Not PGP/TLS)?
Option 1: TLS only (transport encryption) ✓ Simple — HTTPS encrypts in transit ✗ Server sees plaintext → compromised server exposes all messages ✗ Government subpoena → server can hand over messages Option 2: PGP (Pretty Good Privacy) ✓ End-to-end encrypted ✗ No forward secrecy (one key compromise exposes ALL past messages) ✗ Manual key management (user must manage public/private keys) ✗ No key ratcheting → same key for all messages in a session Option 3: Signal Protocol ⭐ (Double Ratchet) ✓ End-to-end encrypted ✓ Forward secrecy (new ephemeral key per message → past messages safe even if key leaked) ✓ Post-compromise security (future messages safe after device recovers from compromise) ✓ Transparent key exchange (Diffie-Hellman, no manual key sharing) ✗ Complex implementation ✗ Server cannot index/search message content (search must be client-side) Why Signal Protocol wins: - Forward secrecy is critical: leaked key only compromises ONE message, not all history - Double Ratchet derives a new key per message → each message has a unique encryption key - Used by: WhatsApp, Signal, Facebook Messenger (secret conversations), Skype
Message Ordering: The Distributed Systems Challenge
Problem: User A sends "Hello" then "How are you?" from two different devices.
Due to network delays, "How are you?" might arrive at the server BEFORE "Hello"
Solutions:
1. Client-generated timestamp → but clocks are not synchronized across devices
2. Snowflake ID ⭐ (recommended):
- Server assigns a monotonically increasing ID on receipt
- Within a single server: guaranteed ordered
- Across servers: approximately ordered (Snowflake embeds timestamp)
- Per-conversation ordering: route all messages for a conversation to
the same Kafka partition (partition key = conversation_id) → strict FIFO
3. Lamport Clock / Vector Clock:
- Tracks causal ordering (A happened before B)
- Overkill for chat — Snowflake + Kafka partition ordering is sufficient
4. Client sequence number:
- Each client maintains a per-conversation sequence counter
- Server detects gaps: "Got seq 5, missing seq 4" → requests retransmission
- Useful for offline-first syncConnection Management: The WebSocket Server Scaling Problem
Problem: 500M WebSocket connections. Each connection is stateful (tied to one server).
If a server dies, all its connections are lost.
Solution 1: Sticky Load Balancing
- L4 LB hashes client IP → always routes to same WS server
- ✓ Simple
- ✗ Uneven distribution; server failure drops all its clients
Solution 2: Connection Registry (Redis) ⭐
- Redis stores: user_id → {ws_server_id, device_id}
- When message needs to be delivered:
1. Lookup recipient's WS server from Redis
2. Send message to that server via internal RPC
3. Server pushes to client over existing WebSocket
- ✓ Decoupled: any server can route to any other
- ✓ Server failure: client reconnects to any server, Redis updated
- ✗ Extra hop (Redis lookup + internal RPC)
Solution 3: Pub/Sub Fan-Out
- Each WS server subscribes to user-specific Redis Pub/Sub channels
- Message for user X → PUBLISH to channel "user:X" → reaches the right server
- ✓ Simpler than direct RPC routing
- ✗ Redis Pub/Sub doesn't persist → if server is down, message lost from this channel
- → Combine with message store (Cassandra) for durability
Production approach:
Redis for connection registry + internal gRPC for message delivery +
Cassandra for persistence + Kafka for async processingGroup Chat: Small Groups vs Large Channels
Small Groups (< 256 members, WhatsApp-style):
- Store members list in a Redis SET
- Message delivery: iterate members, send to each connected user
- Fan-out is small (< 256) → done synchronously
- Every member gets every message (no threading)
Large Channels (1000+ members, Slack/Discord-style):
- Message delivery is fan-out-on-read (too expensive to push to 100K members)
- Channel messages stored in Cassandra
- Client polls for new messages or subscribes to channel WebSocket room
- Unread count tracked per user per channel (Redis counter)
- Threading: messages have parent_id → tree structure (like Reddit comments)
Why fan-out-on-read for large channels:
10K members × 100 messages/hour = 1M deliveries/hour per channel
With 1000 active channels = 1B deliveries/hour → infeasible with fan-out-on-write
Instead: each member fetches new messages when they open the channelStaff interviews expect you to articulate how the system evolves under real growth — not jump straight to the final architecture.
Phase 1 — MVP (monolith, online-only)
Single Node.js/Go service with WebSocket + REST. PostgreSQL for messages. In-memory connection map. No offline delivery, no groups >10.
Key components: Monolith WS gateway · PostgreSQL · In-memory connections
Move to next phase when: Connection count exceeds 10K per instance; need horizontal WS scaling
Phase 2 — Scale (50M DAU, offline + groups)
Dedicated connection gateway fleet with sticky LB. Cassandra for message store (partition by conversation_id). Redis Pub/Sub for cross-pod delivery. Kafka for async group fan-out. Push notification service for offline.
Key components: WS gateway fleet · Cassandra · Redis Pub/Sub · Kafka fan-out · APNs/FCM
Move to next phase when: Celebrity DM hot partitions; multi-region latency >100ms for international users
Phase 3 — Global + E2E (multi-DC, encryption)
Multi-region Cassandra (LOCAL_QUORUM writes, LOCAL_ONE reads). CRDT-free ordering via region-local seq with cross-region replication lag accepted (<500ms). Client-side E2E with key server. CDN for media attachments (S3 + CloudFront).
Key components: Multi-DC Cassandra · Key directory service · S3 + CDN · Geo-routed gateways
Move to next phase when: Regulatory requirement for E2E in EU; or 500M+ concurrent connections
SLOs & Error Budgets
| Metric | Target | Rationale |
|---|---|---|
| Message delivery (online recipient) | p99 < 1s | Core UX — users expect instant chat |
| Message delivery (offline → push) | p99 < 30s | Push pipeline includes APNs/FCM latency |
| WebSocket connection success rate | 99.9% | Failed handshakes = user can't chat at all |
| Message durability | 99.999% | Write-before-ACK; no accepted message may be lost |
Incident Scenarios (2am reality)
| Scenario | How you detect | Mitigation |
|---|---|---|
| Redis Pub/Sub cluster partition during peak hours | Cross-pod message delivery lag >5s; users report messages not appearing despite send ACK | Failover to Kafka-based delivery (pre-provisioned dual path); drain affected gateway pods; clients auto-reconnect with exponential backoff; replay from last_seen_seq on reconnect |
| Cassandra hot partition on viral group chat | Single conversation_id p99 write latency >200ms; seq assignment timeouts | Temporary rate limit on group message sends; split conversation into threads; pre-warm partition with dedicated coordinator node; alert when any conversation exceeds 100 msg/sec |
| Push notification storm after regional outage recovery | APNs/FCM rate limit 429s; inbox sync API QPS 10× baseline | Throttle push to 1 per user per 30s with batch summary ('47 unread messages'); prioritize 1:1 over group; scale inbox sync read replicas; stagger reconnect waves via jittered client backoff |
Cost Drivers (Staff lens)
- WebSocket gateway fleet: ~1000 pods × 50K connections — dominated by RAM and cross-AZ traffic
- Cassandra storage: 500M msgs/day × 500 bytes × 365 days ≈ 90 TB/year before compaction
- Push notifications: $0.001/notification × offline delivery volume — often largest ops cost
- Media in S3: eclipses message text storage; lifecycle to Glacier after 90 days
Multi-Region & DR
User home region owns conversation state (conversation_id hashed to region). Cross-region messages replicate async — accept 200-500ms inter-region delivery. Gateways route to home region via global load balancer + user region registry in Redis. E2E keys replicated separately via key directory with conflict-free merge on device registration.