Design a Real-Time Chat Application (WhatsApp / Slack) – System Design Walkthrough

This problem appears in multiple sheets. Depth expectations increase as you progress:

Track	What to demonstrate
Arch 25	The connection-state problem. Nail WebSocket scaling (sticky sessions vs pub/sub), message ordering per conversation, group fan-out math, and offline delivery without blocking the hot path.
Arch 50	Add read receipts, presence, typing indicators, and media upload flow. Discuss when to use CRDTs vs server-assigned sequence numbers.
Arch 75	Staff: E2E encryption key exchange (Signal protocol sketch), multi-device sync, and what breaks at 500M concurrent connections.

Interview Prompt

Design a real-time chat application like WhatsApp or Slack. Users send 1:1 and group messages with delivery in under 1 second. Support offline message delivery, read receipts, and optional end-to-end encryption.

Clarifying Questions (ask before designing)

Question	Why it matters
1:1 only or groups — and what's the max group size?	Group fan-out is O(members) per message. 256-member Slack channel vs 50-person WhatsApp group drives completely different push architecture.
Do we need message ordering guarantees within a conversation?	Per-conversation total order requires a single sequence authority; cross-device sync adds complexity.
Online-only delivery or persistent offline queue?	Offline queue means durable write before ACK — changes write path from fire-and-forget to at-least-once with idempotency.
E2E encryption required or server-readable messages?	E2E means server stores ciphertext only — search, moderation, and read receipts become client-side problems.

Scope

In scope

1:1 and group messaging with sub-second delivery
WebSocket connection management at scale
Message persistence and offline delivery
Read receipts and delivery status
High-level E2E encryption architecture

Out of scope (state explicitly)

Voice/video calling (WebRTC is a separate design)
Full Signal protocol implementation details
Content moderation ML pipeline
Billing and enterprise SSO

Assumptions

50M DAU, 500M messages/day (~5.8K msg/sec avg, 50K peak)
Average group size 8, max 256 members
Messages retained 1 year; media in object storage
99.9% message delivery within 5 seconds when recipient online

1:1 messaging: Send and receive text messages between two users in real-time
Group messaging: Support group chats with up to 256 members (WhatsApp) or thousands (Slack channels)
Online/Offline presence: Show whether a user is online, offline, or last seen
Read receipts: Show sent ✓, delivered ✓✓, and read (blue ✓✓) status
Media sharing: Send images, videos, documents, voice messages
Message history: Persist messages and allow history retrieval with pagination
Push notifications: Notify offline users of new messages
End-to-end encryption (for WhatsApp-style): Messages encrypted on device, server cannot read content

Metric	Calculation	Value
Total users	Given (product assumption)	2B
DAU	Given (product assumption)	500M
Concurrent connections	Given (peak load assumption)	100M
Messages / day	500M DAU × 100 msgs	50B (~100 msgs per user/day)
Messages / sec	50B ÷ 86400	~580K
Avg message size	Given (typical workload assumption)	200 bytes (text)
Storage / day	50B × 200B	10 TB
Storage / year	Given	~3.6 PB
Media messages (10%)	5B/day × 500KB avg	2.5 PB/day

WebSocket Connection & Chat Servers

Why WebSocket: HTTP is request-response (half-duplex). WebSocket provides full-duplex, persistent connection: essential for real-time messaging
How:
1. Client opens WebSocket connection to a chat server (via load balancer)
2. Connection is kept alive with heartbeats (every 30 seconds)
3. Messages are sent/received over this connection in real-time
4. If connection drops, client reconnects with exponential backoff
Scaling: Each chat server handles ~100K concurrent WebSocket connections
- 100M concurrent users / 100K per server = 1,000 chat servers
Sticky sessions: A user's WebSocket must remain on the same server for the duration of the connection. Use consistent hashing on user_id at the L4 load balancer
Connection state: Each server maintains an in-memory map: user_id → WebSocket connection

Session Service + Redis

Purpose: Track which chat server each user is connected to
Data: session:{user_id} → {server_id, connected_at, last_active}
Why Redis: In-memory, sub-ms lookups, supports TTL for auto-cleanup
Flow: When user connects, chat server registers in Redis. When user disconnects, entry is removed

Message Router (Cross-Server Communication)

Problem: User A is on Chat Server 1, User B is on Chat Server 2. How does the message get from Server 1 to Server 2?
Solution 1: Redis Pub/Sub:
- Each chat server subscribes to a Redis channel named after its server_id
- To route a message: look up recipient's server_id in Session Service → publish to that server's channel
- Pros: Simple, low latency
- Cons: Redis Pub/Sub is fire-and-forget (no persistence). If the subscriber is slow, messages can be lost
Solution 2: Kafka (recommended for durability):
- Topic per chat server or per user-range
- Each chat server is a consumer for its assigned partitions
- Pros: Durable, replayable, handles backpressure
- Cons: Slightly higher latency (~5-10ms) vs Redis Pub/Sub (~1ms)
Hybrid: Use Redis Pub/Sub for online → online delivery (fast path). Use Kafka as durable fallback for offline users and reliability

Message Service

Purpose: Persist messages, retrieve chat history
Write flow:
1. Receive message from chat server
2. Assign a monotonically increasing message_id per conversation (using Snowflake or per-conversation counter)
3. Write to Cassandra
4. Update conversation's last_message in cache
Read flow: Paginated retrieval: SELECT * FROM messages WHERE conversation_id = ? AND message_id < ? ORDER BY message_id DESC LIMIT 20

Cassandra (Message Store)

Why Cassandra:
- Write-heavy workload (580K writes/sec)
- Natural time-series access pattern (messages by conversation, ordered by time)
- Horizontal scaling, multi-DC replication
- No complex joins needed
Partition key: conversation_id (all messages for a chat are co-located)
Clustering key: message_id (ordered retrieval)

Presence / Online Status

How:
- When user connects → set presence:{user_id} = "online" in Redis (with TTL 60s)
- Heartbeat every 30s resets the TTL
- If no heartbeat → key expires → user is "offline"
- Last seen = last_active timestamp from session entry
Fan-out: When user's presence changes, notify their contacts who are online
- For users with many contacts, this is expensive → batch and throttle presence updates
- Only show presence for active conversations (Slack approach)

Group Service

Manages: Group creation, membership, admin roles
Group message delivery:
1. Sender sends message to group
2. Group Service fetches member list
3. For each member: look up session → route message to their chat server
4. Store one copy of the message in DB (not N copies)
Optimization: For large groups (1000+ members), use Kafka topic per group. Members' chat servers subscribe to the group topic

Media Service

Flow:
1. Client uploads media to Media Service (HTTP, not WebSocket: WebSocket is for text)
2. Media Service stores in S3, generates a CDN URL
3. Client sends a message with the media URL (not the binary) over WebSocket
4. Recipient fetches media from CDN
Thumbnail generation: For images/videos, generate a low-res thumbnail (< 10 KB) stored inline in the message: displayed before full media is downloaded
Size limits: Max file size ~100 MB. For large files, use multipart upload with resumable support

Push Notification Service

When triggered: Chat Server detects recipient is OFFLINE (no session in Redis) → publishes to Kafka topic push-notifications
Consumer picks up the event and sends via:
- APNs (Apple Push Notification Service) for iOS
- FCM (Firebase Cloud Messaging) for Android
- Web Push for browser clients
Notification content: "Alice sent you a message" (NOT the full message content: for E2EE, server can't read it)
Batching / collapsing: If multiple messages arrive while user is offline, collapse into "Alice sent you 5 messages" (not 5 separate push notifications)
Badge count: Update unread badge count via silent push (iOS) or data message (Android)
Rate limiting: Max 1 push per conversation per 30 seconds to avoid spamming

E2EE Key Service

Public key directory: Each user registers their public key (identity key + signed pre-key + one-time pre-keys) with the server
Key exchange (X3DH): When Alice wants to message Bob for the first time:
1. Alice fetches Bob's pre-key bundle from Key Service
2. Alice performs X3DH to derive a shared secret
3. Alice encrypts the first message with this shared secret
4. Bob receives the message + Alice's ephemeral public key → derives the same shared secret
Pre-key rotation: One-time pre-keys are consumed on use. Client periodically uploads new batches (100 pre-keys at a time). If exhausted → fall back to signed pre-key (less forward secrecy)
Double Ratchet: After initial key exchange, each message derives a new key: so compromise of one key doesn't expose other messages

End-to-End Encryption (E2EE)

Signal Protocol (used by WhatsApp):
- Each user generates a key pair (public + private)
- Public key registered with the server
- Messages encrypted with recipient's public key on sender's device
- Server stores encrypted ciphertext: cannot read the content
- Key exchange uses X3DH (Extended Triple Diffie-Hellman)
- Forward secrecy via Double Ratchet algorithm

Send Message

JSON

WebSocket Frame:
{
  "type": "send_message",
  "message_id": "client-generated-uuid",
  "conversation_id": "conv-uuid",
  "content": "Hey, how are you?",
  "content_type": "text",
  "timestamp": 1710320000000
}

Server Acknowledgment:
{
  "type": "ack",
  "message_id": "client-generated-uuid",
  "server_message_id": "snowflake-id",
  "status": "sent",
  "timestamp": 1710320000050
}

Receive Message

JSON

WebSocket Frame (pushed to recipient):
{
  "type": "new_message",
  "message_id": "snowflake-id",
  "conversation_id": "conv-uuid",
  "sender_id": "user-123",
  "content": "Hey, how are you?",
  "content_type": "text",
  "timestamp": 1710320000050
}

Message Status Updates

JSON

WebSocket Frame:
{
  "type": "message_status",
  "message_id": "snowflake-id",
  "status": "delivered" | "read",
  "timestamp": 1710320001000
}

REST APIs (for non-real-time operations)

HTTP

GET /api/v1/conversations/{conv_id}/messages?before={message_id}&limit=50
GET /api/v1/conversations
POST /api/v1/groups
POST /api/v1/groups/{group_id}/members

Common Error Responses

400 Bad Request: invalid input, missing fields, or malformed JSON
401 Unauthorized: missing or invalid auth token or API key
403 Forbidden: authenticated but insufficient permissions
404 Not Found: resource ID does not exist
409 Conflict: duplicate write or version conflict; retry with idempotency key
422 Unprocessable Entity: valid syntax but invalid business logic
429 Too Many Requests: rate limit exceeded; honor Retry-After header
500 Internal Error: unexpected server fault; retry with idempotency key
503 Service Unavailable: dependency down or overloaded; use exponential backoff
440 Login Timeout: WebSocket session expired; reconnect required

Cassandra: Messages

SQL

CREATE TABLE messages (
    conversation_id   UUID,
    message_id        BIGINT,       -- Snowflake ID (time-ordered)
    sender_id         UUID,
    content           TEXT,         -- encrypted ciphertext
    content_type      VARCHAR,      -- text, image, video, voice
    media_url         TEXT,
    created_at        TIMESTAMP,
    PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

Cassandra: Conversations per User

SQL

CREATE TABLE user_conversations (
    user_id             UUID,
    last_message_at     TIMESTAMP,
    conversation_id     UUID,
    conversation_type   VARCHAR,     -- '1:1' or 'group'
    last_message_preview TEXT,
    unread_count        INT,
    PRIMARY KEY (user_id, last_message_at)
) WITH CLUSTERING ORDER BY (last_message_at DESC);

Redis: Session & Presence

# Session (which server is user connected to)
Key:    session:{user_id}
Value:  Hash { server_id, connected_at, device_type }
TTL:    300 (5 min, refreshed by heartbeat)

# Presence
Key:    presence:{user_id}
Value:  "online"
TTL:    60 (refreshed by heartbeat every 30s)

# Last Seen (persisted separately)
Key:    last_seen:{user_id}
Value:  epoch_timestamp

MySQL: User & Group Metadata

SQL

CREATE TABLE users (
    user_id     UUID PRIMARY KEY,
    phone       VARCHAR(20) UNIQUE,
    name        VARCHAR(128),
    avatar_url  TEXT,
    public_key  BLOB,           -- for E2EE
    created_at  TIMESTAMP
);

SQL

CREATE TABLE groups (
    group_id    UUID PRIMARY KEY,
    name        VARCHAR(256),
    avatar_url  TEXT,
    created_by  UUID,
    max_members INT DEFAULT 256,
    created_at  TIMESTAMP
);

CREATE TABLE group_members (
    group_id    UUID,
    user_id     UUID,
    role        ENUM('admin', 'member'),
    joined_at   TIMESTAMP,
    PRIMARY KEY (group_id, user_id)
);

Kafka Topics

Topic: messages          (partitioned by conversation_id)
Topic: presence-updates  (partitioned by user_id)
Topic: push-notifications

General

Technique	Application
Message persistence before ack	Message written to Cassandra before server sends 'sent' ack to sender
Client-side retry	Client retries sending if no ack received within timeout
Idempotent writes	message_id generated by client → duplicate sends are idempotent
Multi-DC replication	Cassandra + Redis replicated across data centers

Problem-Specific

1. Chat Server Crashes (User Loses WebSocket)

Client detects connection loss, reconnects to another chat server
New server registers new session in Redis
Client sends "sync" request: "Give me all messages after message_id X"
Server fetches from Cassandra and delivers missed messages
No messages lost: All messages are persisted in Cassandra before ack

2. Message Ordering

Scenario: Two rapid messages arrive at different servers out of order
Solution:
- Messages have Snowflake IDs (time-ordered within sender)
- Client sorts by message_id before displaying
- Server stores with clustering by message_id → reads are ordered

3. Offline Message Delivery

User B is offline when User A sends a message
Message is persisted in Cassandra
unread_count incremented in user_conversations
Push notification sent via APNs/FCM
When User B comes online → sync: fetch all messages with message_id > last_seen_message_id

4. Group Message Fan-Out

256-member group: message must be delivered to 256 users
Solution: Don't fan out to all 256 as individual messages. Store once in DB. Each online member's chat server pulls/receives the message. Offline members sync on reconnect

5. Presence Thundering Herd

User with 10,000 contacts comes online → must notify all 10K contacts
Solution:
- Only notify contacts who have an active conversation open with this user
- Batch presence updates (send every 5 seconds, not instantly)
- For contacts not in the active view, fetch presence lazily when they open the chat

6. Split Brain (Network Partition Between DCs)

Users in DC-A and DC-B both send messages to the same conversation
Solution: Cassandra's eventual consistency + LWW (Last-Write-Wins) with Snowflake timestamps. Both messages are preserved (no conflict for chat: append-only model)

Message Sync Protocol

Each client tracks last_synced_message_id per conversation
On reconnect: GET /messages?conversation_id=X&after=last_synced_message_id
For first-time device: full sync of recent N messages per conversation

Typing Indicators

User A starts typing → send {"type": "typing", "conversation_id": "...", "user_id": "..."} over WebSocket
Ephemeral: not persisted, only forwarded to online participants
Auto-expires after 5 seconds of inactivity

Message Search

Full-text search on messages using Elasticsearch
Index: messages by user_id + conversation_id + content
Note: With E2EE, server can't index encrypted content. Search must happen client-side (WhatsApp approach)

Multi-Device Support

One user on phone + tablet + web simultaneously
Session Service stores multiple sessions per user: sessions:{user_id} → SET of {server_id, device_id}
Messages delivered to ALL active devices
Read receipts synced across devices

Rate Limiting

Max 100 messages per minute per user (prevent spam bots)
Max 256 members per group (WhatsApp) or configurable (Slack)
Max file size: 100 MB

Data Retention

Store messages for 90 days (WhatsApp doesn't store after delivery for E2EE)
Or indefinitely (Slack enterprise) with archival to cold storage (S3 Glacier)

Interview Walkthrough

Scope the chat modes first — 1:1, group, and channels have different fan-out and storage costs; pick one to depth, mention others.
Draw the connection lifecycle: WebSocket handshake → session registry → heartbeat/reconnect — this is where scaling interviews focus.
Guarantee per-conversation message ordering with a monotonic sequence ID; cross-conversation ordering is not required.
Frame delivery guarantees using CAP Theorem and Consistency Models — online users get real-time push; offline users sync on reconnect.
Persist messages through Kafka or a write-ahead log before acknowledging delivery, so crashes do not lose in-flight messages.
For large groups, avoid naive broadcast — use a fan-out service or channel-based pub/sub rather than N individual WebSocket writes.
Address multi-device sync: deliver to all active sessions and reconcile read receipts across devices.
Common pitfall: storing every WebSocket on a single server without a connection registry and sticky routing strategy.

WebSocket vs Long Polling vs SSE: Why WebSocket?

Protocol	How	Latency	Bi-directional	Connection Cost	Best For
WebSocket ⭐	Persistent TCP connection, full-duplex	< 100ms	✓ (both ways)	1 TCP connection per client	Chat, gaming, collaborative editing
Long Polling	Client sends request, server holds until data available	100-500ms	✗ (simulated)	New HTTP request per message received	Fallback when WebSocket not available
SSE	Persistent HTTP connection, server to client only	< 100ms	✗ (server to client only)	1 HTTP connection per client	News feeds, notifications
HTTP Polling	Client polls every N seconds	N seconds	✗	New HTTP request every N seconds	Not suitable for chat

Why WebSocket is ideal for chat:

Bi-directional: Client sends messages AND receives messages on the same connection
Low latency: No HTTP overhead per message (WebSocket frames are tiny: 2-14 bytes header vs 200+ bytes for HTTP headers)
Efficient: 500M users × 1 WebSocket each = 500M connections. With HTTP polling every 1s, that's 500M HTTP requests/sec (30× the load)

When Long Polling is acceptable: As a WebSocket fallback behind corporate proxies that block WebSocket upgrades. Implement auto-detection: try WebSocket → if fails, fall back to Long Polling.

Why Cassandra for Messages (Not MySQL/PostgreSQL/MongoDB)?

Message access patterns:
  1. Write a message (append-only, VERY frequent)
  2. Read recent messages for a conversation (time-range query, VERY frequent)
  3. Search messages (not on Cassandra — use Elasticsearch)
  4. Never: JOINs, transactions, or complex queries on messages

Cassandra ✓:
  - Partition key = conversation_id → all messages for a conversation on same node
  - Clustering key = message_id (time-sorted) → efficient range reads
  - LSM-tree storage → optimized for high write throughput (50K msgs/sec easily)
  - TTL support → auto-expire messages after retention period
  - Linear horizontal scaling → add nodes as message volume grows
  - Multi-DC replication built-in → low latency reads globally

MySQL/PostgreSQL ✗:
  - Single-writer primary → write bottleneck at scale
  - Sharding required (by conversation_id) — Cassandra gives this natively
  - No built-in TTL → need cron jobs for data cleanup

MongoDB:
  - Could work (document model, sharding, TTL indexes)
  - But: Cassandra's time-series data model (partition + clustering key) is a
    more natural fit for "messages ordered by time within a conversation"
  - MongoDB's WiredTiger engine handles writes well but Cassandra's LSM-tree
    is better for append-heavy workloads

E2EE: Why Signal Protocol (Not PGP/TLS)?

Option 1: TLS only (transport encryption)
  ✓ Simple — HTTPS encrypts in transit
  ✗ Server sees plaintext → compromised server exposes all messages
  ✗ Government subpoena → server can hand over messages

Option 2: PGP (Pretty Good Privacy)
  ✓ End-to-end encrypted
  ✗ No forward secrecy (one key compromise exposes ALL past messages)
  ✗ Manual key management (user must manage public/private keys)
  ✗ No key ratcheting → same key for all messages in a session

Option 3: Signal Protocol ⭐ (Double Ratchet)
  ✓ End-to-end encrypted
  ✓ Forward secrecy (new ephemeral key per message → past messages safe even if key leaked)
  ✓ Post-compromise security (future messages safe after device recovers from compromise)
  ✓ Transparent key exchange (Diffie-Hellman, no manual key sharing)
  ✗ Complex implementation
  ✗ Server cannot index/search message content (search must be client-side)

Why Signal Protocol wins:
  - Forward secrecy is critical: leaked key only compromises ONE message, not all history
  - Double Ratchet derives a new key per message → each message has a unique encryption key
  - Used by: WhatsApp, Signal, Facebook Messenger (secret conversations), Skype

Message Ordering: The Distributed Systems Challenge

Problem: User A sends "Hello" then "How are you?" from two different devices.
  Due to network delays, "How are you?" might arrive at the server BEFORE "Hello"

Solutions:
  1. Client-generated timestamp → but clocks are not synchronized across devices

  2. Snowflake ID ⭐ (recommended):
     - Server assigns a monotonically increasing ID on receipt
     - Within a single server: guaranteed ordered
     - Across servers: approximately ordered (Snowflake embeds timestamp)
     - Per-conversation ordering: route all messages for a conversation to
       the same Kafka partition (partition key = conversation_id) → strict FIFO

  3. Lamport Clock / Vector Clock:
     - Tracks causal ordering (A happened before B)
     - Overkill for chat — Snowflake + Kafka partition ordering is sufficient

  4. Client sequence number:
     - Each client maintains a per-conversation sequence counter
     - Server detects gaps: "Got seq 5, missing seq 4" → requests retransmission
     - Useful for offline-first sync

Connection Management: The WebSocket Server Scaling Problem

Problem: 500M WebSocket connections. Each connection is stateful (tied to one server).
  If a server dies, all its connections are lost.

Solution 1: Sticky Load Balancing
  - L4 LB hashes client IP → always routes to same WS server
  - ✓ Simple
  - ✗ Uneven distribution; server failure drops all its clients

Solution 2: Connection Registry (Redis) ⭐
  - Redis stores: user_id → {ws_server_id, device_id}
  - When message needs to be delivered:
    1. Lookup recipient's WS server from Redis
    2. Send message to that server via internal RPC
    3. Server pushes to client over existing WebSocket
  - ✓ Decoupled: any server can route to any other
  - ✓ Server failure: client reconnects to any server, Redis updated
  - ✗ Extra hop (Redis lookup + internal RPC)

Solution 3: Pub/Sub Fan-Out
  - Each WS server subscribes to user-specific Redis Pub/Sub channels
  - Message for user X → PUBLISH to channel "user:X" → reaches the right server
  - ✓ Simpler than direct RPC routing
  - ✗ Redis Pub/Sub doesn't persist → if server is down, message lost from this channel
  - → Combine with message store (Cassandra) for durability

Production approach:
  Redis for connection registry + internal gRPC for message delivery +
  Cassandra for persistence + Kafka for async processing

Group Chat: Small Groups vs Large Channels

Small Groups (< 256 members, WhatsApp-style):
  - Store members list in a Redis SET
  - Message delivery: iterate members, send to each connected user
  - Fan-out is small (< 256) → done synchronously
  - Every member gets every message (no threading)

Large Channels (1000+ members, Slack/Discord-style):
  - Message delivery is fan-out-on-read (too expensive to push to 100K members)
  - Channel messages stored in Cassandra
  - Client polls for new messages or subscribes to channel WebSocket room
  - Unread count tracked per user per channel (Redis counter)
  - Threading: messages have parent_id → tree structure (like Reddit comments)

  Why fan-out-on-read for large channels:
    10K members × 100 messages/hour = 1M deliveries/hour per channel
    With 1000 active channels = 1B deliveries/hour → infeasible with fan-out-on-write
    Instead: each member fetches new messages when they open the channel

SLOs & Error Budgets

Metric	Target	Rationale
Message delivery (online recipient)	p99 < 1s	Core UX — users expect instant chat
Message delivery (offline → push)	p99 < 30s	Push pipeline includes APNs/FCM latency
WebSocket connection success rate	99.9%	Failed handshakes = user can't chat at all
Message durability	99.999%	Write-before-ACK; no accepted message may be lost

Incident Scenarios (2am reality)

Scenario	How you detect	Mitigation
Redis Pub/Sub cluster partition during peak hours	Cross-pod message delivery lag >5s; users report messages not appearing despite send ACK	Failover to Kafka-based delivery (pre-provisioned dual path); drain affected gateway pods; clients auto-reconnect with exponential backoff; replay from last_seen_seq on reconnect
Cassandra hot partition on viral group chat	Single conversation_id p99 write latency >200ms; seq assignment timeouts	Temporary rate limit on group message sends; split conversation into threads; pre-warm partition with dedicated coordinator node; alert when any conversation exceeds 100 msg/sec
Push notification storm after regional outage recovery	APNs/FCM rate limit 429s; inbox sync API QPS 10× baseline	Throttle push to 1 per user per 30s with batch summary ('47 unread messages'); prioritize 1:1 over group; scale inbox sync read replicas; stagger reconnect waves via jittered client backoff

Cost Drivers (Staff lens)

WebSocket gateway fleet: ~1000 pods × 50K connections — dominated by RAM and cross-AZ traffic
Cassandra storage: 500M msgs/day × 500 bytes × 365 days ≈ 90 TB/year before compaction
Push notifications: $0.001/notification × offline delivery volume — often largest ops cost
Media in S3: eclipses message text storage; lifecycle to Glacier after 90 days

Multi-Region & DR

User home region owns conversation state (conversation_id hashed to region). Cross-region messages replicate async — accept 200-500ms inter-region delivery. Gateways route to home region via global load balancer + user region registry in Redis. E2E keys replicated separately via key directory with conflict-free merge on device registration.