Design a Cryptocurrency Exchange

Interview Prompt

Design Cryptocurrency Exchange.

Clarifying Questions (ask before designing)

Question	Why it matters
Which of these is highest priority: Variant of stock exchange + wallet, Hot/cold wallet, Blockchain confirmation?	Forces scope negotiation — senior candidates trim before drawing boxes.
What scale should we design for — DAU, QPS, data volume?	Drives every capacity decision; shows structured thinking.
What are the read vs write patterns on the critical path?	Determines caching, DB choice, and replication topology.
What consistency and durability guarantees are required?	Separates strong-consistency paths from eventual ones — a senior differentiator.

Scope

In scope

Variant of stock exchange + wallet
Hot/cold wallet
Blockchain confirmation
Order matching
Capacity estimation with shown math

Out of scope (state explicitly)

Fraud ML model training (#75) — rules engine is enough unless asked
Merchant onboarding / KYC workflows
Building a PSP or bank from scratch

Assumptions

Strong consistency required on money/inventory paths — clarify idempotency early
External PSP or bank APIs exist; design integration boundaries only
99.99% availability target for the commit/authorize path

Order placement: limit, market, stop-loss orders for crypto pairs
Order matching engine: price-time priority
Wallet management: deposit, withdraw crypto (on-chain) and fiat
Hot wallet (online, for fast withdrawals) and cold wallet (offline, for security)
Real-time order book and trade feed (WebSocket)
Portfolio: view balances, P&L, transaction history
KYC/AML: identity verification, transaction monitoring
Multi-factor authentication (2FA, hardware keys)
Trading fees with tiered pricing (maker/taker)
Staking and lending features

Metric	Calculation	Value
Registered users	Given	50M
DAU	Given	5M
Trading pairs	Given	500
Orders / sec (peak)	Derived from daily volume ÷ 86400 (+ peak factor)	100K
Trades / sec (peak)	Derived from daily volume ÷ 86400 (+ peak factor)	50K
WebSocket connections	Given	2M concurrent
Wallet transactions / day	1M ÷ 86400	1M
Hot wallet balance	Given	5% of total assets

Loading...

Order Lifecycle

1. User submits order: POST /api/orders {pair:BTC_USDT, side:buy, price:50000, qty:0.5}
2. Order Service:
   a. Validate: user authenticated, pair exists, qty > min
   b. Reserve funds: Lua script in Redis → USDT_available -= 25000
   c. If Redis reserve fails → reject immediately
   d. Persist order (status=open) to PostgreSQL
   e. Route to matching engine for this pair

3. Matching Engine (single-threaded, in-memory per pair):
   a. Insert order into order book (TreeMap of price levels)
   b. Match against opposite side: buy $50K vs best ask $49,950 → MATCH at $49,950
   c. Generate trade event. Write to WAL before returning.
   d. Publish trade event to Kafka

4. Post-Trade Settlement (async from Kafka):
   a. Update balances, deduct fees (maker 0.1%, taker 0.15%)
   b. All inside PostgreSQL transaction

5. WebSocket broadcast: order book update, trade feed, user events

Matching Engine Deep Dive

Data structure: Two TreeMaps (Red-Black Trees) per pair
  Bids: sorted by price DESC (highest first), then time ASC
  Asks: sorted by price ASC (lowest first), then time ASC

Price-Time Priority: same price → earlier order fills first (FIFO)

Single-threaded: ONE thread per trading pair
  No locks needed → predictable µs latency
  100 pairs ? 100 independent threads

WAL: Every match written to append-only file BEFORE publish
  On crash → replay WAL → rebuild order book state

Performance: 100K+ matches/sec per pair (in-memory, single-threaded)

Deposit & Withdrawal Flow

DEPOSIT:
  1. Generate HD wallet address (unique per user per chain)
  2. Blockchain Watcher detects incoming transaction
  3. Wait for N confirmations (BTC:3, ETH:12, SOL:32)
  4. Credit user: UPDATE balances SET available = available + amount
  5. Edge case: chain reorg → reverse credit if tx disappears

WITHDRAWAL:
  1. User submits withdrawal (2FA required)
  2. Deduct from available balance (PG transaction)
  3. Hot wallet service signs transaction (HSM multi-sig 2-of-3)
  4. Broadcast to blockchain
  5. Monitor for confirmation
  6. If hot wallet balance < withdrawal amount: queue + trigger cold?hot replenishment

Race Conditions

1. Double-Spend: User has 1 BTC, submits two sell orders simultaneously
   Solution: Redis Lua atomic check-and-decrement (single-threaded, no race)
   PostgreSQL: authoritative backup with SELECT FOR UPDATE

2. Order Cancel During Matching:
   Solution: Single-threaded engine → cancel queued behind match
   If order partially filled before cancel ? fill matched, cancel remainder

3. Withdrawal After Balance Used in Trade:
   Solution: Withdrawal deducts from available, trade from reserved ? different pool
   PostgreSQL transaction ensures atomicity

Blockchain Reorg Handling

Wait for N confirmations before crediting (BTC:3 ˜ 30min). Blockchain Watcher compares block hashes continuously. If reorg detected deeper than N blocks ? reverse pending credits. NEVER let users trade unconfirmed deposits.

Event Bus Design (Kafka)

Topic: cryptocurrency_exchange-events
  Partitions: 64 (scale consumers horizontally)
  Partition key: entity_id (user_id / order_id — preserves per-entity ordering)
  Retention: 7 days (compliance) or 24h (high-volume telemetry)
  Replication factor: 3, min.insync.replicas: 2

Producer: idempotent producer enabled (enable.idempotence=true)
Consumer: consumer group "cryptocurrency_exchange-processors"
  - At-least-once delivery + idempotent handlers (dedup by event_id)
  - DLQ topic: cryptocurrency_exchange-events-dlq (poison messages after 3 retries)
  - Lag alert: consumer lag > 60s → scale workers

Design a Cryptocurrency Exchange: async side effects MUST NOT block the synchronous API response.
  Sync path: validate → persist source of truth → publish event → return 201
  Async path: consumers update caches, indexes, notifications, aggregates

HTTP

POST /api/orders              → Place order {pair, side, type, quantity, price}
DELETE /api/orders/{id}       → Cancel order
GET /api/orders?status=open   → Open orders
GET /api/orderbook/{pair}     → Order book snapshot
GET /api/trades/{pair}        → Recent trades
GET /api/wallet/balances      → Portfolio balances
POST /api/wallet/withdraw     → Initiate withdrawal (2FA required)

# WebSocket streams
WS /ws/orderbook/{pair}       → Real-time order book
WS /ws/trades/{pair}          → Real-time trades
WS /ws/user                   → User order updates, balance changes

Common Error Responses

400 Bad Request: invalid input, missing fields, or malformed JSON
401 Unauthorized: missing or invalid auth token or API key
403 Forbidden: authenticated but insufficient permissions
404 Not Found: resource ID does not exist
409 Conflict: duplicate write or version conflict; retry with idempotency key
422 Unprocessable Entity: valid syntax but invalid business logic
429 Too Many Requests: rate limit exceeded; honor Retry-After header
500 Internal Error: unexpected server fault; retry with idempotency key
503 Service Unavailable: dependency down or overloaded; use exponential backoff

PostgreSQL (Ledger: Source of Truth)

ACID for financial data. DECIMAL(28,18) handles crypto precision (18 decimals for ETH). SELECT FOR UPDATE prevents double-spend.

SQL

CREATE TABLE balances (
    user_id   UUID, currency  TEXT,
    available DECIMAL(28,18) NOT NULL DEFAULT 0 CHECK (available >= 0),
    reserved  DECIMAL(28,18) NOT NULL DEFAULT 0 CHECK (reserved >= 0),
    PRIMARY KEY (user_id, currency)
);

CREATE TABLE orders (
    order_id    UUID PRIMARY KEY, user_id UUID NOT NULL,
    pair        TEXT NOT NULL, side TEXT NOT NULL,
    order_type  TEXT NOT NULL, price DECIMAL(28,18),
    quantity    DECIMAL(28,18) NOT NULL,
    filled_qty  DECIMAL(28,18) DEFAULT 0,
    status      TEXT DEFAULT 'open',
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE trades (
    trade_id    UUID PRIMARY KEY, pair TEXT NOT NULL,
    buyer_id    UUID NOT NULL, seller_id UUID NOT NULL,
    price       DECIMAL(28,18) NOT NULL,
    quantity    DECIMAL(28,18) NOT NULL,
    executed_at TIMESTAMPTZ DEFAULT NOW()
);

Redis (Hot Balances + Order Book Cache)

HSET balance:{user_id} BTC_available "1.234" BTC_reserved "0.5"

# Atomic reservation via Lua script:
local avail = tonumber(redis.call('HGET', KEYS[1], ARGV[1]))
if avail >= tonumber(ARGV[2]) then
    redis.call('HINCRBYFLOAT', KEYS[1], ARGV[1], -tonumber(ARGV[2]))
    redis.call('HINCRBYFLOAT', KEYS[1], ARGV[3], tonumber(ARGV[2]))
    return 1
end
return 0

# Order book snapshot cache
SET orderbook:BTC_USDT '{bids:[...],asks:[...]}' EX 1

SLOs & Error Budgets

Metric	Target	Rationale
Core user-facing availability	99.95%	Budget for planned maintenance + unplanned failures without user-visible outage.
p99 latency (critical path)	Problem-specific — state target early and tie to capacity math	Interview credibility comes from connecting SLO to architecture choices.
Error rate (5xx)	< 0.1%	Distinguishes transient blips from systemic failure requiring rollback.
Data durability	99.999999999% (11 nines) for committed writes	Define which operations require fsync/quorum vs async replication.

Incident Scenarios (2am reality)

Scenario	How you detect	Mitigation
Primary database unavailable	Health check failures, connection pool exhaustion alerts, elevated 5xx	Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists
Traffic spike (10× normal)	RPS anomaly alert, autoscaling lag, latency SLO burn rate	Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations
Bad deploy causing elevated errors	Canary metric regression, error budget burn, deployment correlation	Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility

Cost Drivers (Staff lens)

Egress bandwidth and CDN (often dominates media/data-heavy systems)
Database storage + IOPS at scale (plan compaction, TTL, tiering)
Compute for async pipelines (right-size workers, spot instances for batch)
Managed service premiums vs operational headcount trade-off

Multi-Region & DR

Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.

Interview Prompt

Clarifying Questions (ask before designing)

Scope

In scope

Out of scope (state explicitly)

Assumptions

Order Lifecycle

Matching Engine Deep Dive

Deposit & Withdrawal Flow

Race Conditions

Blockchain Reorg Handling

Event Bus Design (Kafka)

Common Error Responses

PostgreSQL (Ledger: Source of Truth)

Redis (Hot Balances + Order Book Cache)

Exchange Hack Prevention

Proof of Reserves

Market Manipulation Detection

vs Stock Exchange (#76)

Interview Walkthrough

Matching Engine: In-Memory vs Database-Backed

Order Types Implementation

Regulatory Compliance: KYC/AML Architecture

Phase 1: MVP (0 to 100K users)

Phase 2: Growth (100K to 10M users)

Phase 3: Scale (10M+ users)

SLOs & Error Budgets

Incident Scenarios (2am reality)

Cost Drivers (Staff lens)

Multi-Region & DR