Design a Hotel Booking System

Interview Prompt

Design Hotel Booking System.

Clarifying Questions (ask before designing)

Question	Why it matters
Which of these is highest priority: Variant of ticketing, Room availability calendar, Overbooking policy?	Forces scope negotiation — senior candidates trim before drawing boxes.
What scale should we design for — DAU, QPS, data volume?	Drives every capacity decision; shows structured thinking.
What are the read vs write patterns on the critical path?	Determines caching, DB choice, and replication topology.
What consistency and durability guarantees are required?	Separates strong-consistency paths from eventual ones — a senior differentiator.

Scope

In scope

Variant of ticketing
Room availability calendar
Overbooking policy
Cancellation
Price optimization
Capacity estimation with shown math

Out of scope (state explicitly)

Flight and travel package bundling
Property management / housekeeping systems
Revenue-management ML model internals

Assumptions

Clarify scale (DAU, QPS, data volume) for hotel booking system in the first 5 minutes
Standard reliability target 99.9%–99.99% unless problem implies higher (payments, booking)
Managed cloud services (RDS, S3, Kafka, Redis) are acceptable building blocks

Search hotels by location, dates, guests, price range, amenities, star rating
View room types, photos, reviews, availability calendar
Book room(s): select dates → reserve → pay → confirm
Overbooking management: controlled overbooking with walking policy
Cancellation with policy enforcement (free cancel before X days)
Price management: dynamic pricing based on demand, season, events
Loyalty program: points accrual and redemption
Calendar-based inventory (each room-night is a separate unit)

Metric	Calculation	Value
Hotels	Given	1M
Rooms (total)	Given	50M
Bookable room-nights (next 365 days)	Given	18.25B
Searches / sec	Derived from daily volume ÷ 86400 (+ peak factor)	50K
Bookings / sec	Derived from daily volume ÷ 86400 (+ peak factor)	5K
Peak (holiday season)	Given	5× normal

Loading...

HTTP

GET  /api/hotels/search?location=NYC&checkin=2026-12-20&checkout=2026-12-25&guests=2
POST /api/bookings          → {hotel_id, room_type, checkin, checkout, guest_info, payment}
GET  /api/bookings/{id}     → Booking details
DELETE /api/bookings/{id}   → Cancel
GET  /api/hotels/{id}/availability → start=...&end=...

Common Error Responses

400 Bad Request: invalid input, missing fields, or malformed JSON
401 Unauthorized: missing or invalid auth token or API key
403 Forbidden: authenticated but insufficient permissions
404 Not Found: resource ID does not exist
409 Conflict: duplicate write or version conflict; retry with idempotency key
422 Unprocessable Entity: valid syntax but invalid business logic
429 Too Many Requests: rate limit exceeded; honor Retry-After header
500 Internal Error: unexpected server fault; retry with idempotency key
503 Service Unavailable: dependency down or overloaded; use exponential backoff
402 Payment Required: insufficient funds
502 Bad Gateway: payment provider timeout; poll status endpoint

PostgreSQL

SQL

CREATE TABLE room_inventory (
    hotel_id     UUID, room_type TEXT, date DATE,
    total_rooms  INT, booked_rooms INT, overbooking_limit INT,
    price_cents  INT,
    PRIMARY KEY (hotel_id, room_type, date)
);

CREATE TABLE reservations (
    reservation_id UUID PRIMARY KEY,
    hotel_id UUID, room_type TEXT,
    guest_id UUID, checkin DATE, checkout DATE,
    status TEXT,  -- PENDING|CONFIRMED|CANCELLED|CHECKED_IN|COMPLETED
    total_price_cents INT, payment_id UUID,
    cancellation_policy TEXT,
    created_at TIMESTAMPTZ
);

Booking Transaction: Race Condition Prevention

SQL

BEGIN;
-- Lock all room-night rows for the date range
SELECT booked_rooms, total_rooms, overbooking_limit
FROM room_inventory
WHERE hotel_id = $1 AND room_type = $2 AND date BETWEEN $3 AND $4
FOR UPDATE;

-- Check ALL dates have availability
-- If any date has booked_rooms >= overbooking_limit → ROLLBACK

UPDATE room_inventory SET booked_rooms = booked_rooms + 1
WHERE hotel_id = $1 AND room_type = $2 AND date BETWEEN $3 AND $4;

INSERT INTO reservations (...) VALUES (...);
COMMIT;

Scalability: Sharding Strategy

Shard by hotel_id:
  room_inventory: Shard key = hotel_id
  Booking transaction: hits ONE shard (single-shard ACID transaction)
  1M hotels / 16 shards = ~62,500 hotels per shard
  
  Within each shard: partition room_inventory by date range
    PARTITION BY RANGE (date), 1 month per partition

Search (cross-shard):
  Step 1: Elasticsearch returns matching hotels (not sharded by hotel)
  Step 2: Group hotels by shard → fan-out availability queries
  Step 3: Merge results → return to client with prices

Optimization: Redis cache per hotel with availability bitmap
  Key: avail:{hotel_id}:{room_type}:{month}
  Value: bitmask (1 bit per day, 1=available, 0=booked)

Race Condition: Isolation Levels

READ COMMITTED + FOR UPDATE is sufficient since room_inventory rows are pre-created. No INSERTs during booking: only UPDATEs. SERIALIZABLE not needed.

Pooled Inventory vs Named-Room Assignment

Pooled: "5 Deluxe King rooms available": any of the 5 rooms. Simpler booking logic, flexible. Named: Room 401 tracked individually. Industry practice: Pooled inventory for booking, named assignment at check-in. Exception: luxury hotels sell specific rooms.

Eager Payment vs Lazy Payment

Hybrid (industry standard): Authorization hold at booking ? validates card. No-show fee: charge 1 night if guest doesn't cancel. Non-refundable rate: charge immediately (lower price, no cancellation).

Search Staleness vs Real-Time Availability

Problem: Search shows "5 rooms available" → user clicks → booking fails

Option 1: Real-time availability on every search result
  → 50K searches/sec × 50 hotels = 2.5M DB queries/sec → DB dies

Option 2: Cached availability with staleness ?
  Redis bitmap per hotel, refreshed every 30 seconds
  Final check at booking time (authoritative DB with FOR UPDATE)
  Acceptable UX: < 1% of booking attempts fail due to staleness

Option 3: Pessimistic display (show fewer rooms than available)
  Cache shows "available" only if real availability > 2

SLOs & Error Budgets

Metric	Target	Rationale
Core user-facing availability	99.95%	Budget for planned maintenance + unplanned failures without user-visible outage.
p99 latency (critical path)	Problem-specific — state target early and tie to capacity math	Interview credibility comes from connecting SLO to architecture choices.
Error rate (5xx)	< 0.1%	Distinguishes transient blips from systemic failure requiring rollback.
Data durability	99.999999999% (11 nines) for committed writes	Define which operations require fsync/quorum vs async replication.

Incident Scenarios (2am reality)

Scenario	How you detect	Mitigation
Primary database unavailable	Health check failures, connection pool exhaustion alerts, elevated 5xx	Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists
Traffic spike (10× normal)	RPS anomaly alert, autoscaling lag, latency SLO burn rate	Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations
Bad deploy causing elevated errors	Canary metric regression, error budget burn, deployment correlation	Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility

Cost Drivers (Staff lens)

Egress bandwidth and CDN (often dominates media/data-heavy systems)
Database storage + IOPS at scale (plan compaction, TTL, tiering)
Compute for async pipelines (right-size workers, spot instances for batch)
Managed service premiums vs operational headcount trade-off

Multi-Region & DR

Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.

Interview Prompt

Clarifying Questions (ask before designing)

Scope

In scope

Out of scope (state explicitly)

Assumptions

Search Service + Elasticsearch

Booking Service: Critical Path

Pricing Service

Calendar-Based Inventory Model

Overbooking Strategy