Interview Prompt
Design Design Tinder (Matching System).
Clarifying Questions (ask before designing)
| Question | Why it matters |
|---|---|
| Which of these is highest priority: Recommendation + geo filtering, Swipe queue pre-computation, Mutual match detection? | Forces scope negotiation — senior candidates trim before drawing boxes. |
| What scale should we design for — DAU, QPS, data volume? | Drives every capacity decision; shows structured thinking. |
| What are the read vs write patterns on the critical path? | Determines caching, DB choice, and replication topology. |
| What consistency and durability guarantees are required? | Separates strong-consistency paths from eventual ones — a senior differentiator. |
Scope
In scope
- Recommendation + geo filtering
- Swipe queue pre-computation
- Mutual match detection
- Elo/Glicko scoring
- Capacity estimation with shown math
Out of scope (state explicitly)
- Full ads auction and monetization stack
- Content moderation at scale (#81)
- Direct messaging (#07)
Assumptions
- Clarify scale (DAU, QPS, data volume) for tinder in the first 5 minutes
- Standard reliability target 99.9%–99.99% unless problem implies higher (payments, booking)
- Managed cloud services (RDS, S3, Kafka, Redis) are acceptable building blocks
These foundational concepts underpin the patterns used in this problem. Review them before deep-diving into component-level trade-offs.
- Profile creation: Photos, bio, age, gender, preferences (age range, distance, gender)
- Discovery (Swiping): Show nearby profiles one at a time; user swipes right (like) or left (pass)
- Matching: When both users swipe right on each other → MATCH → enable chat
- Geo-based discovery: Only show users within configured radius (e.g., 50 km)
- Chat: Matched users can text message each other
- Super Like: Special like that notifies the other user immediately
- Undo: Undo last swipe (premium feature)
- Boost: Temporarily increase visibility (premium)
- Block/Report: Safety features
- Low Latency: Swipe deck loads in < 200 ms
- Geo-Accuracy: Distance calculation accurate within 1 km
- Consistency: Mutual match must be EXACTLY consistent (no one-sided matches)
- Scalability: 75M+ MAU, 2B+ swipes/day
- Privacy: Location never exposed to other users (only distance shown)
- Freshness: New users/profile updates reflected in discovery within minutes
| Metric | Calculation | Value |
|---|---|---|
| DAU | Given | 25M |
| Swipes / day | Given | 2B |
| Swipes / sec | Derived from daily volume ÷ 86400 (+ peak factor) | ~23K (peak 100K) |
| Matches / day | Given | 30M |
| Avg profiles in deck | Given | 200/session |
| Profile size | Given | 5 KB (metadata) + 5 MB (photos) |
| Geo-query fan-out | Given | ~10K profiles per 50 km radius in dense city |
Mutual Match — Atomic Lua Script
Problem: Simultaneous mutual swipe → Double Match
A swipes right on B at T=0.000, B swipes right on A at T=0.001
Without protection: Both check → both see no prior right-swipe →
both record → both detect match → TWO match records.
Solution: Redis Lua script (atomic, single-threaded):
local already_liked = redis.call('SISMEMBER', 'swiped_right:'..B, A)
redis.call('SADD', 'swiped_right:'..A, B)
if already_liked == 1 then
return 1 -- MATCH
end
return 0 -- no match yet
Match record written to MySQL idempotently:
INSERT IGNORE INTO matches (user_id_1, user_id_2) VALUES (min(A,B), max(A,B))
Always store (smaller_id, larger_id) → prevents duplicate match records.Already-Swiped Tracking — Bloom Filter
Problem: User has swiped on 50K profiles over 6 months.
swiped:{user_id} SET in Redis = 50K × 16 bytes = 800 KB per user
200M users × 800 KB = 160 TB just for swipe dedup → TOO EXPENSIVE
Bloom filter approach ⭐:
BF per user: 50K entries, 0.1% false positive rate → 72 KB per user
200M users × 72 KB = 14 TB → 10× reduction
False positive impact: BF says "already swiped" but actually not →
User never sees that profile → missed opportunity, but harmless
At 0.1% rate → 1 in 1000 profiles incorrectly filtered → acceptable
Implementation:
BF.ADD swiped_bf:{user_id} {target_user_id}
BF.EXISTS swiped_bf:{user_id} {candidate_id}
→ If exists → filter out from deck
→ If not exists → show in deck (guaranteed correct)
Cassandra stores exact swipe history for auditing / undo feature.
Bloom filter is a READ optimization, not the source of truth.Profile Boost — Temporary Visibility Increase
Premium feature: "Boost" places your profile at the top of nearby users' decks
Implementation:
On boost activation:
SET boost:{user_id} {expiry_timestamp} EX 1800 (30-minute boost)
ZADD boosted_users:{geohash_prefix} {score=999} {user_id}
During deck generation for nearby users:
1. First: pull boosted users in this geo area (ZREVRANGE boosted_users:...)
2. Then: normal ranked candidates
3. Mix: 1 boosted profile per 5 normal profiles
Revenue model: ~$5 per boost → at 10M boosts/month = $50M/month
Anti-abuse: Max 1 boost per 12 hours, no stacking.
Fairness: If too many boosts in one area → dilute effect (each boost gets
fewer guaranteed views → prevents boost-only decks)Get Discovery Deck
GET /api/v1/discovery?count=20
→ 200 OK
{ "profiles": [
{ "user_id": "u-uuid", "name": "Alice", "age": 28,
"photos": ["url1", "url2"], "bio": "Love hiking...",
"distance_km": 5.2, "common_interests": ["hiking", "photography"] },
...
]}Swipe
POST /api/v1/swipes
{ "target_user_id": "u-uuid", "action": "right" }
→ 200 OK { "match": true, "match_id": "m-uuid" } // or "match": falseGet Matches
GET /api/v1/matches?cursor={last_match_id}&limit=20
→ 200 OK { "matches": [{ "match_id": "m-uuid", "user": {...}, "matched_at": "..." }] }Common Error Responses
400 Bad Request: invalid input, missing fields, or malformed JSON 401 Unauthorized: missing or invalid auth token or API key 403 Forbidden: authenticated but insufficient permissions 404 Not Found: resource ID does not exist 409 Conflict: duplicate write or version conflict; retry with idempotency key 422 Unprocessable Entity: valid syntax but invalid business logic 429 Too Many Requests: rate limit exceeded; honor Retry-After header 500 Internal Error: unexpected server fault; retry with idempotency key 503 Service Unavailable: dependency down or overloaded; use exponential backoff
MySQL: Core Data
CREATE TABLE users (
user_id BIGINT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100),
birth_date DATE,
gender ENUM('M','F','NB'),
bio TEXT,
photo_urls JSON,
last_active TIMESTAMP,
latitude DECIMAL(10,7),
longitude DECIMAL(10,7),
elo_score FLOAT DEFAULT 1000,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE preferences (
user_id BIGINT PRIMARY KEY,
gender_pref SET('M','F','NB'),
age_min TINYINT,
age_max TINYINT,
distance_km SMALLINT DEFAULT 50,
FOREIGN KEY (user_id) REFERENCES users(user_id)
);
CREATE TABLE matches (
match_id BIGINT PRIMARY KEY AUTO_INCREMENT,
user_id_1 BIGINT,
user_id_2 BIGINT,
matched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE KEY (user_id_1, user_id_2),
INDEX idx_user1 (user_id_1, matched_at DESC),
INDEX idx_user2 (user_id_2, matched_at DESC)
);Redis: Location & Swipe State
# Geospatial index for proximity queries
Key: users:geo
Type: GeoSet
Ops: GEOADD users:geo {lng} {lat} {user_id}
Query: GEORADIUS users:geo {lng} {lat} 50 km COUNT 10000
# Swipe history (who has this user swiped on?)
Key: swiped:{user_id}
Type: SET (all swiped user_ids — left or right)
Purpose: Filter already-seen profiles from discovery
# Right-swipe tracking (for match detection)
Key: swiped_right:{user_id}
Type: SET (user_ids this user swiped right on)
# Pre-computed discovery deck
Key: deck:{user_id}
Type: LIST (ordered user_ids for next swipe session)
TTL: 1 hour| Concern | Solution |
|---|---|
| Match consistency | Redis Lua script ensures atomic check-and-set for mutual match |
| Location staleness | Update location only when app is in foreground; TTL of 24 hours |
| Swipe history too large | Bloom filter for 'already swiped' check (false positive = user re-shown, not harmful) |
| Redis GeoSet loss | Rebuild from MySQL user locations on startup |
| Unfair ranking | Elo score decay for inactive users; reset periodically |
Stale Location: User Moves to New City
Alice was in New York → Tinder shows NY profiles
Alice flies to London → still showing NY profiles!
Solution: Update location on app open + every 30 minutes while active
GEOADD users:geo {new_lng} {new_lat} {user_id}
Invalidate cached deck: DEL deck:{user_id}
→ Next swipe request regenerates deck with London profilesInterview Walkthrough
- Scope the core loop first: fetch deck → swipe → check mutual match — each step has different read/write characteristics.
- Center geospatial discovery on Redis GEO commands (
GEOADD,GEORADIUS) with a configurable radius filter before ranking. - Precompute and cache each user's swipe deck in Redis — regenerating candidates on every swipe request is too slow at scale.
- Detect mutual matches with atomic Redis operations or a Contention pattern on the ordered user pair to prevent duplicate match records.
- Discuss ranking evolution: simple Elo (deprecated) vs ML-based P(mutual match) — mention cold-start handling for new users.
- Store photos on object storage behind CDN and Edge Delivery; profile metadata stays in a relational DB.
- Invalidate the cached deck when location changes — stale geo data shows profiles from the wrong city.
- Common pitfall: querying all users within radius on every swipe without a pre-built candidate pool and exclusion set.
Elo Score vs ML Ranking
Elo Score (Tinder's original approach):
Each user has an Elo rating (like chess)
If a high-Elo user swipes right on you → your Elo increases more
If a low-Elo user swipes left → less impact
Show users with similar Elo scores to each other
✓ Simple, computationally cheap
✗ Reduces to "attractiveness ranking" → ethical concerns
✗ New users with no swipe data → cold start
✗ Doesn't account for individual preferences
Tinder deprecated Elo in 2019.
ML-based scoring ⭐ (current approach):
Features: profile completeness, photo quality, bio length,
shared interests, response rate in chat, activity level
Model: predicts P(mutual match) for each pair
✓ Considers compatibility, not just attractiveness
✓ Handles cold start (content-based features for new users)
✗ More compute-intensive (inference per candidate pair)
✗ Feedback loop: model trained on past matches → may reinforce biasesGeohash vs H3 vs R-tree for Proximity
Redis GeoSet (Geohash internally) ⭐: GEORADIUS: find all users within radius ✓ Built into Redis, zero additional infrastructure ✓ Fast: O(N) where N = users in the area ✗ Accuracy decreases near poles (Geohash distortion) ✗ No polygon queries (only radius) H3 (Uber's hexagonal grid): ✓ Uniform cell sizes worldwide (no pole distortion) ✓ Hierarchical (zoom levels for different granularities) ✗ Not built into Redis (application-level conversion needed) PostGIS / R-tree: ✓ Arbitrary polygon queries ✓ Most accurate spatial operations ✗ Slower than Redis for simple proximity ✗ Harder to scale horizontally
Staff interviews expect you to articulate how the system evolves under real growth — not jump straight to the final architecture.
Phase 1: MVP (0 to 100K users)
Monolith or minimal services proving core tinder flows. Optimize for shipping speed and correctness over scale.
Key components: Single region · Primary DB + Redis cache · Synchronous core path · Basic monitoring
Move to next phase when: p99 latency exceeds SLO or DB CPU sustained above 70%
Phase 2: Growth (100K to 10M users)
Split read/write paths, introduce async processing for non-critical work, add caching layers and horizontal scaling.
Key components: Read replicas or CQRS · Message queue for async work · CDN / edge caching · Service-level SLOs
Move to next phase when: Hot keys, fan-out bottlenecks, or ops toil from manual scaling
Phase 3: Scale (10M+ users)
Shard data plane, multi-region active-active or active-passive, formal DR runbooks, cost optimization.
Key components: Database sharding / partitioning · Multi-region replication · Auto-scaling + chaos testing · Dedicated platform/SRE ownership
Move to next phase when: Regional failure domain risk, compliance data residency, or linear cost growth unsustainable
SLOs & Error Budgets
| Metric | Target | Rationale |
|---|---|---|
| Core user-facing availability | 99.95% | Budget for planned maintenance + unplanned failures without user-visible outage. |
| p99 latency (critical path) | Problem-specific — state target early and tie to capacity math | Interview credibility comes from connecting SLO to architecture choices. |
| Error rate (5xx) | < 0.1% | Distinguishes transient blips from systemic failure requiring rollback. |
| Data durability | 99.999999999% (11 nines) for committed writes | Define which operations require fsync/quorum vs async replication. |
Incident Scenarios (2am reality)
| Scenario | How you detect | Mitigation |
|---|---|---|
| Primary database unavailable | Health check failures, connection pool exhaustion alerts, elevated 5xx | Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists |
| Traffic spike (10× normal) | RPS anomaly alert, autoscaling lag, latency SLO burn rate | Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations |
| Bad deploy causing elevated errors | Canary metric regression, error budget burn, deployment correlation | Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility |
Cost Drivers (Staff lens)
- Egress bandwidth and CDN (often dominates media/data-heavy systems)
- Database storage + IOPS at scale (plan compaction, TTL, tiering)
- Compute for async pipelines (right-size workers, spot instances for batch)
- Managed service premiums vs operational headcount trade-off
Multi-Region & DR
Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.