This problem appears in multiple sheets. Depth expectations increase as you progress:
| Track | What to demonstrate |
|---|---|
| Arch 75 | Staff level: multi-region, cost at scale, migration path, and production metrics. |
Interview Prompt
Design Multi-Currency Payment System.
Clarifying Questions (ask before designing)
| Question | Why it matters |
|---|---|
| Authorize-only vs capture later? Refunds and chargebacks in scope? | Sets idempotency, ledger, and reconciliation boundaries. |
| What scale should we design for — DAU, QPS, data volume? | Drives every capacity decision; shows structured thinking. |
| What are the read vs write patterns on the critical path? | Determines caching, DB choice, and replication topology. |
| What consistency and durability guarantees are required? | Separates strong-consistency paths from eventual ones — a senior differentiator. |
Scope
In scope
- FX rate service
- Settlement currency
- Rounding rules
- Multi-PSP routing
- Cross-border compliance
- Capacity estimation with shown math
Out of scope (state explicitly)
- Fraud ML model training (#75) — rules engine is enough unless asked
- Merchant onboarding / KYC workflows
- Building a PSP or bank from scratch
Assumptions
- Strong consistency required on money/inventory paths — clarify idempotency early
- External PSP or bank APIs exist; design integration boundaries only
- 99.99% availability target for the commit/authorize path
These foundational concepts underpin the patterns used in this problem. Review them before deep-diving into component-level trade-offs.
- Accept payments in any currency: Buyer pays in their local currency (EUR, GBP, JPY, INR...)
- Settle in merchant's currency: Convert and settle to merchant's preferred currency
- Real-time exchange rates: Use live FX rates for conversion at payment time
- FX rate locking: Lock exchange rate for a window (15-30 min) during checkout
- Multi-currency wallets: Users/merchants hold balances in multiple currencies
- Cross-border transfers: Send money internationally with transparent FX fees
- FX markup/fee: Configurable spread on exchange rates for revenue
- Currency display: Show prices in buyer's local currency across the platform
- Accuracy: FX rates accurate to 6 decimal places; no rounding errors
- Low Latency: Currency conversion decision in < 50 ms
- Consistency: Locked FX rate honored even if market moves during checkout
- Compliance: Adhere to currency regulations per country (capital controls, sanctioned currencies)
- Scale: 10M+ cross-currency transactions/day
- Availability: 99.99%
| Metric | Calculation | Value |
|---|---|---|
| Supported currencies | Given | 150+ |
| FX rate updates | Given | Every 30 seconds from providers |
| Cross-currency transactions / day | Given | 10M |
| FX rate lookups / sec | Derived from daily volume ÷ 86400 (+ peak factor) | 50K |
| Locked rate records | Given | 5M active at any time |
FX Rate Management
Rate ingestion:
Multiple FX rate providers (redundancy + best rate selection):
- Reuters/Bloomberg: institutional rates
- ECB (European Central Bank): reference rates
- Open Exchange Rates API: retail rates
Rate with markup:
Raw rate: 1.0845
Platform markup: 2% (revenue)
Buy rate (user buys USD with EUR): 1.0845 * 0.98 = 1.0628
Sell rate (user sells USD for EUR): 1.0845 * 1.02 = 1.1062
Rate triangulation:
No direct rate for NGN->BRL?
NGN -> USD -> BRL (via common intermediate currency)
rate = ngn_usd_rate * usd_brl_rateFX Rate Locking
Without locking: user sees €92.17, takes 5 min to fill payment form, rate changes. With locking: platform absorbs FX risk for 15 min. At scale (10M txns/day), average FX movement in 15 min is < 0.05%.
Checkout flow with rate lock:
1. User sees price: "$100 USD" -> "Show in my currency" -> "€92.17 EUR"
2. User clicks "Pay €92.17":
FX Service locks rate:
INSERT INTO rate_locks (lock_id, from_ccy, to_ccy, rate, expires_at)
VALUES ('lock-uuid', 'EUR', 'USD', 1.0850, NOW() + '15 min');
3. User completes payment within 15 min:
Payment Service: validate lock_id is not expired
Charge: €92.17 EUR -> convert at locked rate 1.0850 -> $100.00 USD to merchant
4. If user takes > 15 min:
Lock expired -> re-quote with current rateMulti-Currency Ledger
Transaction: Buyer pays €92.17, Merchant receives $100.00
EUR ledger:
Entry 1: { account: buyer_eur, DEBIT, €92.17 }
Entry 2: { account: platform_eur, CREDIT, €92.17 }
USD ledger:
Entry 3: { account: platform_usd, DEBIT, $100.00 }
Entry 4: { account: merchant_usd, CREDIT, $100.00 }
FX conversion record:
{ from: €92.17 EUR, to: $100.00 USD, rate: 1.0850, lock_id: "lock-uuid" }
Per-currency balancing:
SUM(EUR debits) = SUM(EUR credits) ?
SUM(USD debits) = SUM(USD credits) ?Smallest Currency Units
Different currencies have different decimal places: USD: 2 decimals ($1.00 = 100 cents) JPY: 0 decimals (¥100 = 100 yen, no fractional yen) BHD: 3 decimals (0.001 BHD = 1 fils) Best practice: store amounts as integers in smallest unit. $100.00 -> 10000 (cents) ¥100 -> 100 0.500 BHD -> 500 (fils) Avoids all decimal arithmetic issues. Integer math is exact.
GET /api/v1/fx/quote?from=EUR&to=USD&amount=100
-> { "from": "EUR", "to": "USD", "amount": 100.00,
"converted": 106.28, "rate": 1.0845, "markup": 0.02,
"effective_rate": 1.0628, "valid_for_seconds": 30 }
POST /api/v1/fx/lock
{ "from_currency": "EUR", "to_currency": "USD", "rate": 1.0850, "ttl_seconds": 900 }
-> { "lock_id": "lock-uuid", "expires_at": "2026-03-14T11:15:00Z" }
POST /api/v1/payments/cross-currency
{ "buyer_currency": "EUR", "buyer_amount": 92.17,
"merchant_currency": "USD", "merchant_amount": 100.00,
"fx_lock_id": "lock-uuid", "merchant_id": "m-uuid" }
-> { "payment_id": "pay-uuid", "status": "completed",
"fx_rate_used": 1.0850, "fx_fee": 1.84 }Common Error Responses
400 Bad Request: invalid input, missing fields, or malformed JSON 401 Unauthorized: missing or invalid auth token or API key 403 Forbidden: authenticated but insufficient permissions 404 Not Found: resource ID does not exist 409 Conflict: duplicate write or version conflict; retry with idempotency key 422 Unprocessable Entity: valid syntax but invalid business logic 429 Too Many Requests: rate limit exceeded; honor Retry-After header 500 Internal Error: unexpected server fault; retry with idempotency key 503 Service Unavailable: dependency down or overloaded; use exponential backoff 402 Payment Required: insufficient funds 502 Bad Gateway: payment provider timeout; poll status endpoint
Redis: Live Rates
fx_rate:{base}:{quote} -> Hash { rate, bid, ask, provider, updated_at }
TTL: 60 (stale after 1 min without update)PostgreSQL: Locks & Ledger
CREATE TABLE rate_locks (
lock_id UUID PRIMARY KEY, from_currency CHAR(3), to_currency CHAR(3),
locked_rate DECIMAL(12,6), expires_at TIMESTAMPTZ, used BOOLEAN DEFAULT FALSE,
payment_id UUID, created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE fx_conversions (
conversion_id UUID PRIMARY KEY, payment_id UUID,
from_currency CHAR(3), from_amount DECIMAL(18,2),
to_currency CHAR(3), to_amount DECIMAL(18,2),
rate_used DECIMAL(12,6), lock_id UUID,
platform_fee DECIMAL(18,2), created_at TIMESTAMPTZ
);| Concern | Solution |
|---|---|
| FX provider down | Multiple providers with fallback; use last known rate with staleness warning |
| Rate lock expired mid-payment | Extend lock by 5 min on active payment; or re-quote |
| Rounding errors | Use DECIMAL(18,6) for rates; banker's rounding |
| FX risk | Platform treasury team hedges FX exposure daily; automated rebalancing |
| Sanctioned currencies | Country/currency blocklist enforced at API gateway level |
Rate Staleness and Provider Failover
Multi-provider strategy:
Priority order: Reuters (best quality) -> Bloomberg -> ECB -> OpenExchangeRates
Every 30 seconds:
1. Try primary provider (Reuters)
2. If fails: fall to Bloomberg
3. If both fail: use ECB (updated hourly)
4. If all fail: use last known rate with staleness flag
Staleness rules:
Rate < 1 min old: "live" (green)
Rate 1-5 min old: "recent" (yellow)
Rate 5-30 min old: "stale" (orange) — show warning
Rate > 30 min old: "expired" (red)Interview Walkthrough
- Frame the problem as FX risk management, not just currency conversion — the platform holds exposure between charge and settlement.
- Walk through checkout: lock FX rate at payment initiation (15-min window), charge in buyer currency, settle to merchant in their currency.
- Explain rate sourcing hierarchy: primary provider → secondary fallback → last-known-good with staleness flag and wider spread.
- Cover treasury hedging: net daily FX exposure per currency pair and execute forward contracts to limit platform P&L swings.
- Mention payment routing: route USD through Stripe US, EUR through Adyen EU, local methods (UPI, PIX) through regional acquirers.
- Discuss multi-currency ledger with separate balance accounts per currency and cross-currency transfers via FX conversion entries.
- Common pitfall: using live mid-market rates without locking at checkout — a 2% currency move between cart and capture creates merchant disputes.
Settlement and Treasury
Platform accumulates various currencies throughout the day: EUR balance: +€500K, USD balance: -$542K, GBP balance: +£100K Daily settlement: 1. Calculate net position per currency 2. Execute FX trades in wholesale market (better rates than retail) 3. Transfer settled amounts to merchants' bank accounts 4. T+1 or T+2 settlement (1-2 business days) FX hedging: If platform knows it will receive €1M tomorrow (from European sales): Buy a EUR/USD forward contract today -> lock in today's rate Eliminates FX risk on the $1M settlement.
Payment Routing by Currency
Different PSPs have different strengths per currency/country: Stripe: excellent for USD, EUR, GBP Adyen: strong in EU and Asia, 150+ currencies Local PSPs: Razorpay (INR), iDEAL (NL), Alipay (CNY) Smart routing: For INR payments: route to Razorpay (lower fees, higher approval rates) For EUR payments: route to Adyen (native EUR processing) For USD payments: route to Stripe (lowest processing fee) Benefits: higher approval rates, lower fees, fewer chargebacks
FX Risk Management: Platform Exposure
Problem: Platform locks EUR/USD at 1.0850 for 15 min for buyer. If EUR/USD moves to 1.0750, platform loses on conversion. At 10M cross-currency txns/day x avg $50 = $500M daily volume Mitigation: 1. Markup covers most risk: 2% markup >> average 15-min FX movement (0.05%) 2. Auto-hedging: if net position in any currency exceeds $1M, execute FX hedge 3. Dynamic lock TTL: normal=15min, high volatility=5min, extreme=2min+larger markup
Staff interviews expect you to articulate how the system evolves under real growth — not jump straight to the final architecture.
Phase 1: MVP (0 to 100K users)
Monolith or minimal services proving core multi currency payment flows. Optimize for shipping speed and correctness over scale.
Key components: Single region · Primary DB + Redis cache · Synchronous core path · Basic monitoring
Move to next phase when: p99 latency exceeds SLO or DB CPU sustained above 70%
Phase 2: Growth (100K to 10M users)
Split read/write paths, introduce async processing for non-critical work, add caching layers and horizontal scaling.
Key components: Read replicas or CQRS · Message queue for async work · CDN / edge caching · Service-level SLOs
Move to next phase when: Hot keys, fan-out bottlenecks, or ops toil from manual scaling
Phase 3: Scale (10M+ users)
Shard data plane, multi-region active-active or active-passive, formal DR runbooks, cost optimization.
Key components: Database sharding / partitioning · Multi-region replication · Auto-scaling + chaos testing · Dedicated platform/SRE ownership
Move to next phase when: Regional failure domain risk, compliance data residency, or linear cost growth unsustainable
SLOs & Error Budgets
| Metric | Target | Rationale |
|---|---|---|
| Core user-facing availability | 99.95% | Budget for planned maintenance + unplanned failures without user-visible outage. |
| p99 latency (critical path) | Problem-specific — state target early and tie to capacity math | Interview credibility comes from connecting SLO to architecture choices. |
| Error rate (5xx) | < 0.1% | Distinguishes transient blips from systemic failure requiring rollback. |
| Data durability | 99.999999999% (11 nines) for committed writes | Define which operations require fsync/quorum vs async replication. |
Incident Scenarios (2am reality)
| Scenario | How you detect | Mitigation |
|---|---|---|
| Primary database unavailable | Health check failures, connection pool exhaustion alerts, elevated 5xx | Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists |
| Traffic spike (10× normal) | RPS anomaly alert, autoscaling lag, latency SLO burn rate | Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations |
| Bad deploy causing elevated errors | Canary metric regression, error budget burn, deployment correlation | Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility |
Cost Drivers (Staff lens)
- Egress bandwidth and CDN (often dominates media/data-heavy systems)
- Database storage + IOPS at scale (plan compaction, TTL, tiering)
- Compute for async pipelines (right-size workers, spot instances for batch)
- Managed service premiums vs operational headcount trade-off
Multi-Region & DR
Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.