This problem appears in multiple sheets. Depth expectations increase as you progress:
| Track | What to demonstrate |
|---|---|
| Arch 25 | Concurrency under contention. Nail seat map model, lock TTL semantics, pessimistic vs optimistic locking trade-offs, double-booking prevention, and fair waiting queue (virtual waiting room). |
| Arch 50 | Add dynamic pricing tiers, season ticket holder priority, and resale marketplace with transfer locks. |
| Arch 75 | Staff: prove correctness under partition (can you double-book if DB splits?), and how Ticketmaster-scale on-sales differ from normal checkout. |
Interview Prompt
Design an online ticketing system like Ticketmaster. Users browse events, select seats from a venue map, and purchase tickets. Prevent double-booking when thousands of users compete for the same seats during a popular on-sale.
Clarifying Questions (ask before designing)
| Question | Why it matters |
|---|---|
| Reserved seating or general admission? | Reserved = seat-level locking with map UI. GA = inventory counter only (simpler, see #68 flash sale pattern). |
| On-sale spike — how many concurrent users for hot events? | 500K users × 20K seats = extreme contention. Drives waiting room, lock TTL, and lock granularity. |
| Lock duration — how long can user hold seats before payment? | 7–10 min industry standard. Too short = abandoned carts; too long = seats locked during sellout frenzy. |
| Optimistic or pessimistic locking acceptable? | Pessimistic (row lock) is safe but serializes; optimistic works for low contention, fails at on-sale scale. |
Scope
In scope
- Venue seat map and availability model
- Seat locking with TTL during checkout
- Double-booking prevention under concurrency
- Waiting queue / virtual waiting room for hot on-sales
- Order completion and ticket issuance
Out of scope (state explicitly)
- Payment processing details (#24)
- Secondary resale marketplace full design
- Venue access control / barcode scanning at gate
- Dynamic pricing / surge (#73)
Assumptions
- Reserved seating, venues up to 80K seats
- Hot on-sale: 500K concurrent users, 20K seats
- 10-min seat lock TTL during checkout
- Zero double-booking tolerance (hard requirement)
These foundational concepts underpin the patterns used in this problem. Review them before deep-diving into component-level trade-offs.
- Browse events: Search and browse movies, concerts, sports events by location, date, genre
- View available seats: Display seat map with real-time availability (available, reserved, booked)
- Select seats: User selects specific seats (for assigned seating) or quantity (for general admission)
- Temporary hold: Selected seats are temporarily held (5-10 minutes) while user completes checkout
- Book & Pay: Complete booking with payment integration
- Booking confirmation: E-ticket (QR code) via email/app
- Cancel/Refund: Cancel booking and process refund based on cancellation policy
- Waiting list: If show is sold out, join a waiting list
- Strong Consistency: Two users must NOT book the same seat (the #1 constraint)
- High Availability: 99.99% availability
- Low Latency: Seat availability loads in < 200 ms
- Handle Traffic Spikes: Popular events (Taylor Swift, Avengers premiere) → 1M+ users hitting the system simultaneously
- Fair Booking: First-come-first-served; prevent bots from grabbing all tickets
- Idempotent: No double-booking, no double-charging
- Scalability: 1000+ concurrent events, 10M+ bookings/day during peak
| Metric | Calculation | Value |
|---|---|---|
| Active events at any time | Given (assumption documented in value) | 50K |
| Seats per event (avg) | Given (typical workload assumption) | 500 |
| Total bookable seats | Given (assumption documented in value) | 25M |
| Bookings / day | Given | 5M |
| Peak bookings / sec (popular event on-sale) | Given (peak on-sale spike, not avg) | 50K |
| Seat checks / sec (availability) | Given (peak load assumption) | 500K |
| Temporary holds active at peak | Given (peak load assumption) | 1M |
Scale challenge: 1M users hitting 'Buy' at 10:00:00 AM. Needs Virtual Queue to protect the transactional database.
The system uses Redis for fast seat locking and PostgreSQL as the source of truth for confirmed bookings. A Virtual Queue service acts as a buffer for high-traffic spikes.
1. Complete Booking Flow: Step-by-Step Sequence
2. Seat Lifecycle & States
3. API Gateway: First Line of Defense
- Bot detection: Fingerprint client (TLS fingerprint, JS challenge, device fingerprint). Block known headless browsers (Puppeteer, Selenium)
- Rate limiting: Token bucket per IP (100 req/min), per user (50 req/min for booking endpoints)
- Virtual queue gate: For hot events, redirect all booking attempts through Virtual Queue Service instead of directly to Booking Service
- JWT validation: Verify booking token issued by Virtual Queue (contains event_id, user_id, expiry)
4. Event / Search Service
- Catalog management: CRUD for events, venues, shows, pricing tiers
- Search: Elasticsearch index on event title, category, city, venue, artist name, tags
- Caching: Redis caches hot event metadata (poster, description, show times) with TTL=5min
- Why MySQL for catalog: Relational data with clear schema. Read replicas for search traffic
5. Booking Service: Multi-Seat Atomic Hold (Lua Script)
When a user selects multiple seats (e.g., 4 tickets), ALL must be held atomically. If any one fails, none should be held:
-- Redis Lua script: atomic multi-seat hold
-- KEYS = list of seat lock keys
-- ARGV[1] = user_id, ARGV[2] = TTL in seconds
-- Phase 1: Check all seats are available
for i, key in ipairs(KEYS) do
if redis.call('EXISTS', key) == 1 then
return {0, key} -- FAIL: seat already held, return which seat
end
end
-- Phase 2: All available → lock all atomically
for i, key in ipairs(KEYS) do
redis.call('SET', key, ARGV[1], 'EX', ARGV[2])
end
return {1, 'OK'} -- SUCCESS: all seats heldWhy Lua script? Redis executes Lua scripts atomically (single-threaded). No other command can interleave between the EXISTS checks and SET commands. This guarantees either ALL seats are held or NONE are: critical for group bookings.
6. Booking Confirmation: Database Transaction
BEGIN TRANSACTION;
-- 1. Verify hold is still valid and belongs to this user
SELECT hold_id, user_id, seat_ids, hold_expires_at
FROM seat_holds
WHERE hold_id = $1 AND user_id = $2 AND hold_expires_at > NOW()
FOR UPDATE;
-- 2. Verify payment was authorized
-- 3. Update all seats to BOOKED
UPDATE seats
SET status = 'booked', booked_by = $2, booking_id = $3
WHERE show_id = $4 AND seat_id = ANY($5) AND status = 'held' AND held_by = $2;
-- 4. Verify all seats were updated (row count = expected seat count)
-- 5. Insert booking record
INSERT INTO bookings (booking_id, user_id, show_id, seat_ids, total_amount, payment_id, status, booked_at)
VALUES ($3, $2, $4, $5, $6, $7, 'confirmed', NOW());
-- 6. Insert audit log
INSERT INTO booking_audit_log (booking_id, action, actor, details, created_at)
VALUES ($3, 'CONFIRMED', $2, '{"seats": ...}', NOW());
-- 7. Decrement available seat count
UPDATE shows SET available_seats = available_seats - $8 WHERE show_id = $4;
COMMIT;7. Virtual Queue Service: Handling 1M Simultaneous Users
For hot events, users join a queue implemented in Redis. A worker drains the queue at a rate the DB can handle (e.g., 100 users/sec), issuing a signed JWT token for booking.
# Queue drainer worker loop
while True:
# Pop next batch of users whose turn it is
users = redis.zpopmin(f"queue:{event_id}", count=BATCH_SIZE)
for user_id, score in users:
# Issue a signed booking token (JWT)
token = jwt.encode({
"user_id": user_id,
"event_id": event_id,
"issued_at": now(),
"expires_at": now() + timedelta(minutes=10),
"max_tickets": 4
}, SECRET_KEY)
# Notify user their turn has arrived
push_notification(user_id, {"status": "your_turn", "booking_token": token})
sleep(BATCH_SIZE / DRAIN_RATE)8. Seat Map Service: Real-Time Availability Push
Booking Service publishes status changes to Kafka, fanned out to WebSocket servers, streaming updates directly to the client UI:
{
"type": "seat_update",
"show_id": "show-uuid",
"updates": [
{"seat_id": "P-A1", "status": "held"},
{"seat_id": "P-A2", "status": "held"}
]
}9. Payment Service: Two-Phase Payment with Idempotency
- Why two-phase? Hold the funds first (authorize), confirm the charge after booking is persisted (capture). If booking fails, release the authorization: no money moves.
- Idempotency: Every payment attempt has a unique
idempotency_key(reusedhold_id) → retry-safe.
10. Hold Expiry Worker: Background Cleanup
# Runs every 30 seconds
def cleanup_expired_holds():
# Find expired holds in PostgreSQL
expired = db.query("""
SELECT hold_id, show_id, seat_ids
FROM seat_holds
WHERE status = 'active' AND hold_expires_at < NOW()
""")
for hold in expired:
db.begin_transaction()
db.execute("UPDATE seat_holds SET status = 'expired' WHERE hold_id = %s", hold.hold_id)
db.execute("""
UPDATE seats SET status = 'available', held_by = NULL, held_until = NULL
WHERE show_id = %s AND seat_id = ANY(%s) AND status = 'held'
""", hold.show_id, hold.seat_ids)
db.execute("""
UPDATE shows SET available_seats = available_seats + %s
WHERE show_id = %s
""", len(hold.seat_ids), hold.show_id)
db.commit()
# Publish seat-available event
kafka.produce('seat-status-changes', {
'show_id': hold.show_id,
'seats': hold.seat_ids,
'status': 'available',
'reason': 'hold_expired'
})Search Events
GET /api/v1/events?city=mumbai&category=movie&date=2026-03-13&q=avengers&sort=popularity&page=1&limit=20Get Event Details + Show Times
GET /api/v1/events/{event_id}
Response: 200 OK
{
"event_id": "evt-uuid",
"title": "Avengers: Secret Wars",
"shows": [
{
"show_id": "show-uuid-1",
"show_time": "2026-03-13T14:00:00+05:30",
"available_seats": 125,
"sections": [
{"name": "Premium", "price": 800, "available": 20, "total": 50}
]
}
]
}Get Seat Map (Real-Time)
GET /api/v1/events/{event_id}/shows/{show_id}/seatsJoin Virtual Queue
POST /api/v1/queue/join
{
"event_id": "evt-uuid",
"show_id": "show-uuid"
}
Response: 200 OK
{
"queue_id": "q-uuid",
"position": 4523,
"estimated_wait_seconds": 180
}Poll Queue Status
GET /api/v1/queue/status/{queue_id}
Response (your turn): 200 OK
{
"status": "your_turn",
"booking_token": "eyJhbGciOiJIUzI1NiIs...",
"max_tickets": 4
}Hold Seats
POST /api/v1/bookings/hold
{
"show_id": "show-uuid",
"seat_ids": ["P-A1", "P-A4"],
"booking_token": "eyJhbGciOiJIUzI1NiIs..."
}Confirm Booking (Pay & Book)
POST /api/v1/bookings/confirm
Idempotency-Key: idem-uuid-12345
{
"hold_id": "hold-uuid",
"payment_method_id": "pm-uuid"
}Cancel Booking
POST /api/v1/bookings/{booking_id}/cancelJoin Waitlist
POST /api/v1/waitlist
{
"show_id": "show-uuid",
"requested_seats": 2,
"preferred_section": "Premium"
}Get User's Bookings
GET /api/v1/users/me/bookings?status=confirmed&page=1Payment Service APIs (Internal & Gateway)
POST /api/v1/internal/payments/authorize
{
"booking_id": "booking-uuid",
"amount": 299.99,
"payment_method": "credit_card",
"idempotency_key": "hold-uuid"
}
Response: 200 OK
{ "status": "authorized", "authorization_id": "auth-123" }
POST /api/v1/internal/payments/capture
{ "payment_id": "pay-uuid" }
Response: 200 OK
{ "status": "captured" }
POST /api/v1/internal/payments/void
{ "payment_id": "pay-uuid" }
Response: 200 OK
{ "status": "voided" }
POST /api/v1/payments/{payment_id}/refund
{ "reason": "user_cancelled" }
Response: 200 OK
{ "status": "refunded" }Common Error Responses
400 Bad Request: invalid input, missing fields, or malformed JSON 401 Unauthorized: missing or invalid auth token or API key 403 Forbidden: authenticated but insufficient permissions 404 Not Found: resource ID does not exist 409 Conflict: duplicate write or version conflict; retry with idempotency key 422 Unprocessable Entity: valid syntax but invalid business logic 429 Too Many Requests: rate limit exceeded; honor Retry-After header 500 Internal Error: unexpected server fault; retry with idempotency key 503 Service Unavailable: dependency down or overloaded; use exponential backoff 402 Payment Required: insufficient funds 502 Bad Gateway: payment provider timeout; poll status endpoint
PostgreSQL: Venues & Sections
CREATE TABLE venues (
venue_id UUID PRIMARY KEY,
name VARCHAR(256) NOT NULL,
capacity INT,
layout_svg_url TEXT
);
CREATE TABLE venue_sections (
section_id UUID PRIMARY KEY,
venue_id UUID REFERENCES venues,
name VARCHAR(64),
total_seats INT,
UNIQUE (venue_id, name)
);MySQL: Events & Shows
CREATE TABLE events (
event_id UUID PRIMARY KEY,
title VARCHAR(256) NOT NULL,
category VARCHAR(64),
venue_id UUID NOT NULL,
is_hot BOOLEAN DEFAULT FALSE,
status VARCHAR(20) DEFAULT 'upcoming',
FULLTEXT idx_search (title, artist_name, description)
);
CREATE TABLE shows (
show_id UUID PRIMARY KEY,
event_id UUID NOT NULL,
venue_id UUID NOT NULL,
show_time TIMESTAMP NOT NULL,
available_seats INT NOT NULL
);PostgreSQL: Seats & Holds
CREATE TABLE seats (
seat_id VARCHAR(16),
show_id UUID NOT NULL,
section_name VARCHAR(64),
status VARCHAR(12) NOT NULL DEFAULT 'available',
held_by UUID,
held_until TIMESTAMP,
booked_by UUID,
booking_id UUID,
version INT DEFAULT 0,
PRIMARY KEY (show_id, seat_id),
CONSTRAINT chk_status CHECK (status IN ('available', 'held', 'booked')),
CONSTRAINT uq_booked UNIQUE (show_id, seat_id, booking_id)
);
CREATE TABLE seat_holds (
hold_id UUID PRIMARY KEY,
user_id UUID NOT NULL,
show_id UUID NOT NULL,
seat_ids TEXT[] NOT NULL,
status VARCHAR(20) DEFAULT 'active',
hold_expires_at TIMESTAMP NOT NULL
);PostgreSQL: Bookings & Tickets
CREATE TABLE bookings (
booking_id UUID PRIMARY KEY,
user_id UUID NOT NULL,
show_id UUID NOT NULL,
seat_ids TEXT[] NOT NULL,
total_amount DECIMAL(10,2) NOT NULL,
payment_id UUID,
status VARCHAR(20) DEFAULT 'confirmed',
idempotency_key VARCHAR(128) UNIQUE
);
CREATE TABLE tickets (
ticket_id UUID PRIMARY KEY,
booking_id UUID NOT NULL REFERENCES bookings,
show_id UUID NOT NULL,
seat_id VARCHAR(16) NOT NULL,
qr_code_url TEXT,
status VARCHAR(20) DEFAULT 'active',
UNIQUE (show_id, seat_id, status) WHERE status = 'active'
);PostgreSQL: Payments
CREATE TABLE payments (
payment_id UUID PRIMARY KEY,
booking_id UUID REFERENCES bookings,
user_id UUID NOT NULL,
amount DECIMAL(10,2) NOT NULL,
currency VARCHAR(3) DEFAULT 'INR',
payment_method VARCHAR(32), -- 'credit_card', 'debit_card', 'upi', 'wallet'
gateway VARCHAR(32), -- 'stripe', 'razorpay'
gateway_txn_id VARCHAR(128), -- PSP's transaction reference
authorization_id VARCHAR(128),
status VARCHAR(20), -- 'authorized', 'captured', 'voided', 'refunded'
idempotency_key VARCHAR(128) UNIQUE,
created_at TIMESTAMP DEFAULT NOW(),
captured_at TIMESTAMP
);Redis Structure
# Seat holds
Key: seat:{show_id}:{seat_id}:lock
Value: {user_id}:{hold_id}
TTL: 600 (10 minutes)
# Virtual Queue Sorted Set
Key: queue:{event_id}:{show_id}
Members: user_id
Scores: join_timestamp_microseconds
# Caches
Key: seats:avail:set:{show_id} (Set of available seats)
Key: event:meta:{event_id} (JSON string metadata)Kafka Topics
Topic: seat-status-changes (partition key: show_id) Topic: booking-events (partition key: booking_id) Topic: payment-events (partition key: payment_id) Topic: notification-events (partition key: user_id)
Elasticsearch Event Index Schema
{
"event_id": "evt-uuid",
"title": "Avengers: Secret Wars",
"category": "movie",
"city": "Mumbai",
"venue_name": "PVR IMAX",
"status": "on_sale",
"popularity_score": 9500,
"location": { "lat": 18.9944, "lon": 72.8265 }
}| Concern | Solution |
|---|---|
| DB replication | PostgreSQL synchronous standby → zero booking data loss on primary failure |
| Redis Cluster | 3 masters + 3 replicas for seat locks; auto-failover if a master dies |
| Kafka durability | RF=3, min.insync.replicas=2; no event lost |
| Idempotency keys | On booking confirmation API → prevents double-charge on retry |
| Circuit breaker | Between Booking Service → Payment Gateway; if PSP is down, don't hold seats forever |
| Health checks | LB actively monitors all service instances every 5 seconds |
| Multi-AZ deployment | All critical services deployed across 3 availability zones |
1. Double-Booking Prevention
Four distinct layers of defense protect seat integrity:
- Redis SET NX: Pre-locks the seat in-memory before reaching the database.
- DB optimistic lock: The version column is verified in the update query.
- DB unique constraint: A conditional unique index on
(show_id, seat_id, booking_id)prevents concurrent booking writes. - Application check: Ensures the booking actor matches the hold owner.
2. Hold Not Released (Cleanup)
- Primary: Redis keys automatically expire using TTL (10 minutes).
- Secondary: Background Hold Expiry Worker continuously runs to release PostgreSQL rows that passed their expiration time.
3. Payment Succeeds but Booking DB Write Fails
Compensating Saga: If the database write times out or errors, the Booking Service captures this failure and triggers an immediate asynchronous refund command to the Payment Service, while voiding the card authorization and releasing Redis locks.
4. Payment Gateway Timeout
The transaction is marked as payment_pending rather than immediately failed. A background polling worker contacts Stripe/PSP API with the idempotency key to confirm status before releasing or capturing funds.
1. Contiguous Seat Selection Algorithm
Finds group seating while prioritizing center placement and lower row numbers:
def find_best_contiguous_seats(show_id: str, section: str, count: int) -> List[Seat]:
rows = db.query("""
SELECT row, seat_id, number, status
FROM seats
WHERE show_id = %s AND section_name = %s
ORDER BY row ASC, number ASC
""", show_id, section)
candidates = []
for row_label, seats_in_row in group_by_row(rows):
current_run = []
for seat in seats_in_row:
if seat.status == 'available':
current_run.append(seat)
else:
if len(current_run) >= count:
candidates.append(current_run[:])
current_run = []
if len(current_run) >= count:
candidates.append(current_run[:])
if not candidates:
return []
def score(run):
group = run[:count]
row_ord = ord(group[0].row)
center = max(s.number for s in seats_in_row) / 2
avg_pos = sum(s.number for s in group) / count
return (row_ord * 100) + abs(center - avg_pos)
best_run = min(candidates, key=score)
return best_run[:count]2. Dynamic Pricing Engine
def calculate_dynamic_price(show_id: str, section: str, base_price: float) -> float:
show = get_show(show_id)
section_info = get_section_info(show_id, section)
# Factor 1: Demand (fill rate)
fill_rate = 1 - (section_info.available / section_info.total)
demand_multiplier = 1.0
if fill_rate > 0.9:
demand_multiplier = 1.5
elif fill_rate > 0.7:
demand_multiplier = 1.2
# Factor 2: Time to show
hours_to_show = (show.show_time - now()).total_seconds() / 3600
time_multiplier = 1.0
if hours_to_show < 2:
time_multiplier = 0.7
price = base_price * demand_multiplier * time_multiplier
return round(max(min(price, base_price * 2.0), base_price * 0.6), 2)3. QR Ticket offline Verification (JWT Asymmetric RS256)
# Ticket verification at venue gates
def verify_ticket_qr(qr_data: str) -> VerificationResult:
try:
# Verified using public RSA key offline
payload = jwt.decode(qr_data, QR_PUBLIC_KEY, algorithms=["RS256"])
except jwt.ExpiredSignatureError:
return VerificationResult(valid=False, reason="Ticket expired")
except jwt.InvalidTokenError:
return VerificationResult(valid=False, reason="Invalid ticket")
if is_already_scanned(payload["ticket_id"]):
return VerificationResult(valid=False, reason="Ticket already used")
mark_scanned(payload["ticket_id"])
return VerificationResult(valid=True, seat=payload["seat_id"], ticket_id=payload["ticket_id"])4. Priority-Based Waitlist
# Score weight by user tier
WAITLIST_PRIORITY_WEIGHTS = {
'premium_subscriber': 3.0,
'loyalty_gold': 2.0,
'regular': 1.0
}
def waitlist_score(user, created_at):
priority = WAITLIST_PRIORITY_WEIGHTS.get(user.tier, 1.0)
return priority * 1000000 - created_at.timestamp()5. Scalper Bot Prevention
To prevent ticket hoarding by automated scalping scripts during hot-on-sales, a multi-layered bot mitigation strategy is employed:
| Defense Mechanism | How It Works | Effectiveness |
|---|---|---|
| CAPTCHA (reCAPTCHA v3) | Score-based risk assessment before allowing entry into the virtual queue. | High |
| Strict Rate Limiting | Enforces max 4 tickets per authenticated user/credit card/phone number. | Medium |
| Browser Fingerprinting | Canvas fingerprint, WebGL hashes, and font enumerations to detect VM/headless browsers. | High |
| TLS Fingerprinting (JA3) | Inspects TLS Client Hello signatures to distinguish real browsers from scripts. | High |
| Verified Fan Program | Pre-registering accounts days before the on-sale, backed by phone/ID verification. | Very High |
| Phone OTP verification | Requires an SMS/OTP check at checkout to authenticate human buyers. | High |
Interview Walkthrough
- Open with the on-sale spike scenario — millions of users, finite seats — and frame it as a Contention pattern problem from the start.
- Propose a virtual waiting room that admission-controls traffic before it reaches the booking service; tune drain rate with Back-of-the-Envelope Estimation.
- Use Redis atomic holds (
SET NX EX) for seat reservations with a TTL, then confirm asynchronously in PostgreSQL — hybrid over pure DB locking. - Contrast pessimistic (
SELECT FOR UPDATE), optimistic (CAS), and Redis holds; explain why pure pessimistic locking collapses under concert demand. - Make payment confirmation idempotent — the same hold token must not charge twice if the client retries after a timeout.
- Cover QR ticket verification: signed payloads, one-time scan marking, and anti-replay to block duplicate entry.
- Mention bot mitigation layers (CAPTCHA, rate limits, verified fan programs) as operational requirements, not afterthoughts.
- Common pitfall: locking rows in PostgreSQL for every browse request instead of only at checkout confirmation.
Pessimistic vs. Optimistic Locking vs. Redis Atomic Holds
Pessimistic Locking (SELECT FOR UPDATE):
✓ Simple and guaranteed correct in the DB.
✗ Poor performance under massive concurrent demand; blocks connection pools.
Optimistic Locking (CAS / Version check):
✓ High throughput under moderate read-heavy load; no long DB locks.
✗ Under high contention (concert on-sale) → extreme abort/retry rates, degrading user experience.
Redis Atomic Holds (SET NX EX):
✓ Ultra-fast (sub-ms), handles 100K+ ops/sec, auto-expires unused slots.
✗ More complex multi-resource state management; requires async reconciliation.
Recommendation: Hybrid model using Redis for hold locks, and PostgreSQL row updates on confirmation.
Virtual waiting room drain rate
Tuning the virtual waiting room drain rate is an operational balancing act. A high drain rate results in a fast queue experience but runs the risk of overloading downstream booking microservices and database connection pools. Conversely, a conservative drain rate keeps the system stable but increases customer queue wait times, encouraging abandonment.
Back-of-the-Envelope Admission Rate Estimation:
To determine the ideal queue admission rate, we can apply this capacity planning formula:
Formula: Admit Rate (users/sec) = Available Seats / Avg Checkout Duration (seconds) Example: - Event capacity: 5,000 available seats - Average checkout transaction duration: 3 minutes (180 seconds) - Target hold safety factor: 1.0 (fill all seats in one checkout wave) Calculation: - 5,000 seats / 180 seconds = ~27.7 users/second Operational Setting: - Set queue gate release rate to 28 users per second. - This ensures PostgreSQL connections never experience more than 5,000 concurrent active holds, completely eliminating connection pool exhaustion.
Staff interviews expect you to articulate how the system evolves under real growth — not jump straight to the final architecture.
Phase 1 — Simple reserved seating
Monolith + PostgreSQL. Pessimistic row locks on seat hold. 10-min TTL via scheduled job. No waiting room — all users hit seat map directly. Works for venues < 5K seats, normal sales.
Key components: Monolith · PostgreSQL row locks · TTL cron · Static seat map
Move to next phase when: On-sale for 20K-seat arena: DB connection pool exhausted in 30 sec
Phase 2 — Redis gate + waiting room
Virtual waiting room admits rate-limited batches. Redis SETNX pre-check before DB CAS. WebSocket for live availability map. Hold extension API. Separate read replicas for seat map browsing; writes to primary only.
Key components: Waiting room · Redis SETNX gate · WebSocket availability · CAS seat TX · Read replicas
Move to next phase when: Double-booking incident from race between two app servers (before Redis gate)
Phase 3 — On-sale platform
Event-sharded seat databases (hot event isolated). CockroachDB for multi-region venues. Pre-load seat inventory into Redis for hold path; async confirm to DB. Lottery mode for ultra-hot events. Real-time queue position with estimated wait.
Key components: Event sharding · Redis-primary hold path · CockroachDB · Lottery mode · Queue ETA
Move to next phase when: International tour — seats must sell simultaneously in US + EU without split-brain
SLOs & Error Budgets
| Metric | Target | Rationale |
|---|---|---|
| Double-booking rate | 0 | Hard invariant — legal and trust catastrophe |
| Hold acquisition p99 | < 500ms | User selecting seats — snappy feedback |
| Waiting room admission accuracy | 100% | Queue token must map to admitted user |
| Availability broadcast lag | < 1 sec | Map must reflect releases in near-real-time |
Incident Scenarios (2am reality)
| Scenario | How you detect | Mitigation |
|---|---|---|
| TTL reaper stuck — thousands of seats locked as HELD | Available seat count near zero despite low sales; lock_expires_at in past but status still HELD | Emergency reaper script: release all HELD where expires_at < now() - 1min; fix cron; extend on-sale pause if needed |
| Waiting room bypass — bots hit /hold directly | Hold rate 10× admission rate; queue depth stable but DB melting | Enforce signed queue token on /hold middleware; rotate signing key; block offending IP ranges; CAPTCHA escalation |
| Split-brain double-book during DB failover | Two confirmation emails for same seat; customer support alert | Use CockroachDB/Patroni with synchronous replication for seat partition; favor unavailability over inconsistency; manual refund + alternate seat for affected buyer |
Cost Drivers (Staff lens)
- On-sale DB TPS: 5K holds/min × $0.001/query on provisioned IOPS — spike provisioning for 2-hour window
- WebSocket connections: 500K concurrent × connection memory on gateway cluster
- Waiting room infra: mostly stateless — queue state in Redis is cheap vs seat DB contention cost
Multi-Region & DR
Seat inventory for a venue lives in one region (single writer). Global fans route to that region's API during on-sale. CDN for static seat maps only — never cache availability. Multi-region failover = pause sales, not active-active on seat rows.