This problem appears in multiple sheets. Depth expectations increase as you progress:
Interview Prompt
Design ETA Calculation Service.
Clarifying Questions (ask before designing)
| Question | Why it matters |
|---|---|
| Which of these is highest priority: Real-time traffic data ingestion, Graph-based routing with live weights, ML-based prediction? | Forces scope negotiation — senior candidates trim before drawing boxes. |
| What scale should we design for — DAU, QPS, data volume? | Drives every capacity decision; shows structured thinking. |
| What are the read vs write patterns on the critical path? | Determines caching, DB choice, and replication topology. |
| What consistency and durability guarantees are required? | Separates strong-consistency paths from eventual ones — a senior differentiator. |
Scope
In scope
- Real-time traffic data ingestion
- Graph-based routing with live weights
- ML-based prediction
- Caching route segments
- Capacity estimation with shown math
Out of scope (state explicitly)
- Full payment processing (#24)
- Turn-by-turn map rendering (#54)
- Driver/rider identity verification and background checks
Assumptions
- Single metro / region unless interviewer asks for multi-city
- Mobile clients with intermittent connectivity — server is source of truth
- Managed geo + messaging infra (Kafka, Redis, RDS) is acceptable
These foundational concepts underpin the patterns used in this problem. Review them before deep-diving into component-level trade-offs.
- Point-to-point ETA: Estimate travel time with current traffic
- Multi-stop ETA: Routes with multiple waypoints
- ETA for pickup: How long until a driver reaches a pickup point
- Historical ETA: "How long does this trip usually take on Tuesday at 8 AM?"
- Batch ETA: Compute for N origin-destination pairs efficiently
- ETA updates: Continuously update as trip progresses
- Multi-modal: Driving, walking, cycling, transit
- Accuracy: ETA within ±10% of actual travel time
- Low Latency: Single ETA in < 100 ms; batch of 20 in < 500 ms
- High Throughput: 500K ETA requests/sec peak
- Freshness: Traffic conditions reflected within 60 seconds
- Availability: 99.99%
- Graceful degradation: Fall back to historical patterns
- Global: Support all major cities
| Metric | Calculation | Value |
|---|---|---|
| ETA requests / sec | From ETA requests / day ÷ 86400 (+ peak factor in value) | 500K peak |
| Single ETA latency target | Given (assumption documented in value) | < 100 ms |
| Batch ETA (20 pairs) target | Given (assumption documented in value) | < 500 ms |
| Active road segments tracked | Given (assumption documented in value) | 50M globally |
| Traffic updates / sec | From Traffic updates / day ÷ 86400 (+ peak factor in value) | 10M GPS data points |
| Road graph size (in-memory) | Given | ~300 GB globally |
| ETA model features | Given | ~50 per query |
| ETA cache hit rate | Given | ~30% |
ETA Calculation Pipeline: Three-Layer Architecture
Layer 1: Route Engine ETA (physics-based)
Find route via Contraction Hierarchies (< 1 ms)
For each segment: time = distance / speed (min of speed_limit, traffic_speed)
Layer 2: Traffic-Adjusted ETA
traffic_speed = Redis GET traffic:{segment_id} → current avg speed
If no traffic data: use historical avg speed
Add turn penalties (15-60s per major turn) and signal delays
Layer 3: ML Correction Model ⭐ (secret sauce)
Gradient Boosted Trees (XGBoost/LightGBM)
Features: route, traffic, temporal, weather, historical, spatial
Output: correction_factor
Final ETA = route_engine_ETA × correction_factorTraffic Aggregation Pipeline
1. GPS Ingestion (10M points/sec) → Kafka
2. Map Matching (Flink): GPS point → road segment
3. Segment Speed Aggregation (Flink, 60-sec window):
avg_speed = median(speeds), confidence = min(1.0, sample_count / 10)
4. Store in Redis: HSET traffic:{segment_id} speed {s} confidence {c}
TTL: 120 seconds
5. Fallback hierarchy: real-time → recent → historical → speed_limit × 0.7Batch ETA: Matching System Integration
Many-to-One Routing: Reverse Dijkstra from destination → find all 20 origins ~5 ms for 20 origins (vs 20 ms for separate queries) ETA Matrix (pre-computed): H3 cells (resolution 9 ≈ 175m), pre-compute cell-to-cell ETAs within 30 km ~500M pairs × 4 bytes = 2 GB per city → O(1) lookup Accuracy ±2-3 min, good enough for initial matching Refinement: After narrowing to top 3, compute exact ETA with full route + traffic
Event Bus Design (Kafka)
Topic: eta_calculation_service-events Partitions: 64 (scale consumers horizontally) Partition key: entity_id (user_id / order_id — preserves per-entity ordering) Retention: 7 days (compliance) or 24h (high-volume telemetry) Replication factor: 3, min.insync.replicas: 2 Producer: idempotent producer enabled (enable.idempotence=true) Consumer: consumer group "eta_calculation_service-processors" - At-least-once delivery + idempotent handlers (dedup by event_id) - DLQ topic: eta_calculation_service-events-dlq (poison messages after 3 retries) - Lag alert: consumer lag > 60s → scale workers Design an ETA Calculation Service: async side effects MUST NOT block the synchronous API response. Sync path: validate → persist source of truth → publish event → return 201 Async path: consumers update caches, indexes, notifications, aggregates
Single ETA
GET /api/v1/eta?origin_lat=37.7749&origin_lng=-122.4194&dest_lat=37.3382&dest_lng=-121.8863&mode=driving&departure_time=now
Response: 200 OK
{
"eta_seconds": 1920,
"eta_display": "32 min",
"distance_meters": 72400,
"confidence": 0.85,
"traffic_level": "moderate",
"breakdown": {
"route_engine_eta": 1680,
"traffic_adjustment": 180,
"ml_correction": 60
}
}Batch ETA
POST /api/v1/eta/batch
{
"origins": [
{"lat": 37.78, "lng": -122.41, "id": "driver-1"},
{"lat": 37.77, "lng": -122.43, "id": "driver-2"}
],
"destination": {"lat": 37.7749, "lng": -122.4194},
"mode": "driving"
}
Response: 200 OK
{
"etas": [
{"id": "driver-1", "eta_seconds": 240, "distance_meters": 1200},
{"id": "driver-2", "eta_seconds": 480, "distance_meters": 2100}
]
}ETA Matrix
GET /api/v1/eta/matrix?origin_h3=892830926cfffff&dest_h3=89283092e3fffffLive ETA Update (WebSocket)
// Server pushes every 60 seconds:
{
"type": "eta_update",
"session_id": "nav-uuid",
"remaining_eta_seconds": 1080,
"remaining_distance_meters": 35200,
"traffic_ahead": "moderate",
"route_changed": false
}Common Error Responses
400 Bad Request: invalid input, missing fields, or malformed JSON 401 Unauthorized: missing or invalid auth token or API key 403 Forbidden: authenticated but insufficient permissions 404 Not Found: resource ID does not exist 409 Conflict: duplicate write or version conflict; retry with idempotency key 422 Unprocessable Entity: valid syntax but invalid business logic 429 Too Many Requests: rate limit exceeded; honor Retry-After header 500 Internal Error: unexpected server fault; retry with idempotency key 503 Service Unavailable: dependency down or overloaded; use exponential backoff 440 Login Timeout: WebSocket session expired; reconnect required
Redis: Live Traffic + ETA Cache
traffic:{segment_id} → Hash { speed, confidence, updated_at } (TTL: 120s)
eta_matrix:{origin_h3}:{dest_h3} → INT (eta_seconds) (TTL: 300s)
eta_cache:{origin_geo6}:{dest_geo6}:{mode}:{hour} → INT (TTL: 300s)
hist_speed:{segment_id}:{dow}:{hour} → FLOAT (no TTL, updated weekly)ClickHouse — Historical Trip Data
CREATE TABLE completed_trips (
trip_id UUID,
origin_lat Float64, origin_lng Float64,
dest_lat Float64, dest_lng Float64,
origin_h3 UInt64, dest_h3 UInt64,
distance_meters UInt32,
predicted_eta UInt32, actual_duration UInt32,
departure_hour UInt8, day_of_week UInt8,
is_holiday UInt8,
weather Enum8('clear'=0,'rain'=1,'snow'=2,'fog'=3),
route_segments Array(UInt64),
mode Enum8('driving'=0,'walking'=1,'cycling'=2),
created_at DateTime
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY (origin_h3, dest_h3, departure_hour, day_of_week);ML Model Store
Model: gradient_boosted_trees_v23.pkl 50M trips trained, 50 features, ~50 MB in memory Inference time: < 1 ms per prediction Retrained weekly, A/B tested before deployment
| Concern | Solution |
|---|---|
| Traffic service down | Fall back to historical speed patterns |
| ML model error | Circuit breaker: bypass ML if latency > 50ms or error > 5% |
| Route engine crash | Multiple replicas; fallback to pre-computed ETA matrix |
| Stale traffic data | TTL on Redis keys; use historical + confidence=LOW flag |
| GPS data pipeline lag | Flink checkpointing; if lag > 5 min, switch to historical |
| ETA cache thundering herd | Singleflight: only compute once for concurrent identical requests |
Interview Walkthrough
- Open with the latency budget: sub-100 ms single ETA and sub-500 ms batch — every design choice must justify its place in that envelope.
- Describe a three-tier pipeline: Haversine filter → Contraction Hierarchies road distance → full routing with live traffic and ML correction.
- Explain the traffic ingestion path: GPS probes → Flink aggregation → Redis segment speeds with TTL-based staleness fallbacks.
- Cover batch and matrix APIs for driver matching — pre-compute H3-cell ETA matrices for hot corridors to avoid redundant routing work.
- Mention caching with singleflight to prevent thundering herd on identical origin-destination pairs during rush hour.
- Discuss display smoothing and percentile ETAs (P50 for riders, P90 for delivery promises) to manage user trust when raw estimates jitter.
- Common pitfall: computing full routing for every candidate driver in a pool of 50 without the Haversine/CH pre-filter — latency blows past the SLA.
Live ETA Updates: Smoothing
displayed_eta = 0.7 × previous + 0.3 × raw Only show change when delta > 2 min or > 10% Why: GPS jitter → ETA fluctuates ±1 min Without smoothing: "17 min" → "18 min" → "17 min" → confusing
Time-Dependent Routing
Trip starts at 5:30 PM, arrives at 6:30 PM (rush gets worse) For each segment: estimated_arrival = departure + sum(previous_times) traffic_speed = predicted_speed(segment_id, estimated_arrival) predicted_speed: within 60 min → real-time + trend; 1-3h → blend 30/70; 3h+ → historical only
ETA Percentiles
Single ETA = 28 min doesn't capture uncertainty. Better: "28 min (25-35 min range)" Quantile regression: p10, p50, p90 models Rider display: P50 (median) Delivery promise: P90 (90% confidence) Matching: P50 for ranking, flag if P90 too high
Haversine vs Road Distance vs Routing ETA
| Level | Method | Time | Accuracy | Use Case |
|---|---|---|---|---|
| 1 | Haversine | < 1 μs | 50-200% off | Initial filter (5 km) |
| 2 | Road Distance (CH) | < 1 ms | Distance correct, time rough | Rank top 20 |
| 3 | Routing ETA + Traffic + ML | 10-50 ms | ±10% | Final ETA to user |
XGBoost vs Deep Learning for ETA
XGBoost ⭐: - Tabular features → GBT excels (often beats DL on structured data) - Inference < 1 ms (vs DL: 5-50 ms) - Interpretable feature importance - Trains in minutes on 50M samples When DL wins: sequence modeling (LSTM on GPS trace), GNN for spatial dependencies Google Maps: GBT for base ETA + GNN for spatial traffic prediction
Staff interviews expect you to articulate how the system evolves under real growth — not jump straight to the final architecture.
Phase 1: MVP (0 to 100K users)
Monolith or minimal services proving core eta calculation service flows. Optimize for shipping speed and correctness over scale.
Key components: Single region · Primary DB + Redis cache · Synchronous core path · Basic monitoring
Move to next phase when: p99 latency exceeds SLO or DB CPU sustained above 70%
Phase 2: Growth (100K to 10M users)
Split read/write paths, introduce async processing for non-critical work, add caching layers and horizontal scaling.
Key components: Read replicas or CQRS · Message queue for async work · CDN / edge caching · Service-level SLOs
Move to next phase when: Hot keys, fan-out bottlenecks, or ops toil from manual scaling
Phase 3: Scale (10M+ users)
Shard data plane, multi-region active-active or active-passive, formal DR runbooks, cost optimization.
Key components: Database sharding / partitioning · Multi-region replication · Auto-scaling + chaos testing · Dedicated platform/SRE ownership
Move to next phase when: Regional failure domain risk, compliance data residency, or linear cost growth unsustainable
SLOs & Error Budgets
| Metric | Target | Rationale |
|---|---|---|
| Core user-facing availability | 99.95% | Budget for planned maintenance + unplanned failures without user-visible outage. |
| p99 latency (critical path) | Problem-specific — state target early and tie to capacity math | Interview credibility comes from connecting SLO to architecture choices. |
| Error rate (5xx) | < 0.1% | Distinguishes transient blips from systemic failure requiring rollback. |
| Data durability | 99.999999999% (11 nines) for committed writes | Define which operations require fsync/quorum vs async replication. |
Incident Scenarios (2am reality)
| Scenario | How you detect | Mitigation |
|---|---|---|
| Primary database unavailable | Health check failures, connection pool exhaustion alerts, elevated 5xx | Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists |
| Traffic spike (10× normal) | RPS anomaly alert, autoscaling lag, latency SLO burn rate | Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations |
| Bad deploy causing elevated errors | Canary metric regression, error budget burn, deployment correlation | Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility |
Cost Drivers (Staff lens)
- Egress bandwidth and CDN (often dominates media/data-heavy systems)
- Database storage + IOPS at scale (plan compaction, TTL, tiering)
- Compute for async pipelines (right-size workers, spot instances for batch)
- Managed service premiums vs operational headcount trade-off
Multi-Region & DR
Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.