This problem appears in multiple sheets. Depth expectations increase as you progress:
| Track | What to demonstrate |
|---|---|
| Arch 75 | Staff level: multi-region, cost at scale, migration path, and production metrics. |
Interview Prompt
Design Email Service (like Gmail).
Clarifying Questions (ask before designing)
| Question | Why it matters |
|---|---|
| Which of these is highest priority: SMTP relay, Mailbox storage, Search indexing? | Forces scope negotiation — senior candidates trim before drawing boxes. |
| What scale should we design for — DAU, QPS, data volume? | Drives every capacity decision; shows structured thinking. |
| What are the read vs write patterns on the critical path? | Determines caching, DB choice, and replication topology. |
| What consistency and durability guarantees are required? | Separates strong-consistency paths from eventual ones — a senior differentiator. |
Scope
In scope
- SMTP relay
- Mailbox storage
- Search indexing
- Spam filtering (Bayesian + ML)
- Threading
- Attachment storage
Out of scope (state explicitly)
- Detailed frontend/UI pixel implementation
- Org structure, staffing, and hiring plan
Assumptions
These foundational concepts underpin the patterns used in this problem. Review them before deep-diving into component-level trade-offs.
- Send and receive emails (SMTP) with attachments (up to 25 MB)
- Inbox, Sent, Drafts, Spam, Trash folders + custom labels/folders
- Full-text search across all emails (subject, body, sender, attachments)
- Conversation threading: group related emails into threads
- Spam filtering using ML + rule-based system
- Push notifications for new emails
- Rich text compose (HTML email) with inline images
- Contact management and autocomplete
- Filters and rules: auto-label, auto-archive, auto-forward
- Calendar integration (event invitations, RSVP)
- High Availability: 99.99%: email is mission-critical
- Durability: Zero email loss (once accepted for delivery)
- Low Latency: Email delivery < 5 seconds (within same system)
- Search Performance: Full-text search < 500ms across millions of emails
- Scalability: 1B+ users, 100B+ stored emails
- Security: TLS in transit, encryption at rest, phishing detection
| Metric | Calculation | Value |
|---|---|---|
| Users | Given (assumption documented in value) | 1B |
| Emails sent / day | 300B ÷ 86400 | 300B (50% spam) |
| Emails received per user / day | 50 ÷ 86400 | 50 (after spam filtering) |
| Avg email size | Given (typical workload assumption) | 50 KB (body) + 200 KB avg attachment |
| Storage per user | Given (assumption documented in value) | 15 GB |
| Total storage | 1B × 15 GB | 15 EB |
| Emails / sec (inbound) | From Emails / day ÷ 86400 (+ peak factor in value) | 3.5M |
| Search queries / sec | From Search queries / day ÷ 86400 (+ peak factor in value) | 100K |
Email Send Flow
1. User clicks "Send" → POST /api/messages/send 2. Validate: recipients exist, attachment size < 25MB, rate limit check 3. Store email body + attachments to Blob Store (S3) 4. Store email metadata to Bigtable/Cassandra 5. Enqueue to send queue (Kafka topic: outgoing-emails) 6. SMTP Sender Worker picks up from queue: a. DNS MX lookup: recipient domain → find receiving mail server b. Open TLS connection to receiving server (STARTTLS) c. Authenticate: sign with DKIM key for sender's domain d. Transmit email via SMTP protocol e. Receiving server ACKs → mark as delivered f. If rejected → generate bounce (DSN) → deliver to sender's inbox 7. If temporary failure: retry with exponential backoff (1min → 72h) After 72h of retries → permanent failure → bounce to sender Optimization for internal emails (sender@gmail → recipient@gmail): Skip SMTP entirely → directly store in recipient's mailbox → 100ms delivery
Incoming Email Flow (SMTP Receive)
1. External sender's MTA connects to our SMTP gateway (MX record) 2. SMTP handshake: EHLO, MAIL FROM, RCPT TO 3. Before accepting DATA: a. SPF check: is sender's IP authorized for their domain? b. Rate limiting: too many emails from this IP? → 421 temporary reject c. Recipient exists? → 550 user unknown if not 4. Accept DATA → email content streamed 5. DKIM verification: check cryptographic signature 6. DMARC evaluation: combine SPF + DKIM → pass/fail policy 7. Spam classification: ML model scores email (0-1) Score > 0.7 → spam folder; 0.3-0.7 → show warning; < 0.3 → inbox 8. Virus scan: check attachments for malware (ClamAV) 9. Store body to Blob Store, metadata to Bigtable 10. Index in Elasticsearch for search 11. Push notification to recipient (if enabled) 12. Return 250 OK to sender's MTA
Spam Filtering Architecture
Layer 1: Connection-level filters (< 1ms per connection) - IP reputation: is this IP in known spammer lists? (Spamhaus ZEN) - Volume limits: > 1000 emails from this IP in last hour → rate limit - DNS checks: does IP reverse-resolve to a legitimate hostname? Catches: ~70% of spam before content inspection Layer 2: Content analysis (10-50ms per email) - SPF/DKIM/DMARC: cryptographic proof of sender legitimacy - Spam rules: regex patterns, known spam phrases, HTML structure - URL analysis: links in email → check against phishing databases - Attachment scanning: ClamAV for malware Catches: additional ~20% of spam Layer 3: ML classification (100-300ms per email) - NLP model: BERT fine-tuned on spam/ham corpus - Features: text, sender reputation, social graph, historical interaction - Score > 0.85 → spam folder; 0.5-0.85 → warning; < 0.5 → inbox Catches: additional ~9% of spam Result: < 0.1% false positive rate (legitimate email marked as spam)
Push vs Pull (IMAP IDLE vs Polling)
IMAP IDLE: server holds connection open, pushes EXISTS notification. Google Sync: FCM (Android) / APNs (iOS) for battery-efficient push. Web client: WebSocket for real-time inbox updates.
SPF, DKIM, DMARC
SPF: DNS record listing authorized IPs for a domain. DKIM: Cryptographic signature in email header. DMARC: Policy telling receivers what to do if SPF/DKIM fails (reject/quarantine).
# Email operations
GET /api/messages?label=INBOX&page_token=... → List emails
GET /api/messages/{id} → Get email
POST /api/messages/send → Send email
PUT /api/messages/{id}/labels → Add/remove labels
PUT /api/messages/{id}/read → Mark read/unread
DELETE /api/messages/{id} → Move to trash
POST /api/messages/{id}/reply → Reply
POST /api/messages/{id}/forward → Forward
GET /api/threads/{thread_id} → Get thread
GET /api/messages/search?q=from:alice+subject:meeting → Full-text searchCommon Error Responses
400 Bad Request: invalid input, missing fields, or malformed JSON
401 Unauthorized: missing or invalid auth token or API key
403 Forbidden: authenticated but insufficient permissions
404 Not Found: resource ID does not exist
409 Conflict: duplicate write or version conflict; retry with idempotency key
422 Unprocessable Entity: valid syntax but invalid business logic
429 Too Many Requests: rate limit exceeded; honor Retry-After header
500 Internal Error: unexpected server fault; retry with idempotency key
503 Service Unavailable: dependency down or overloaded; use exponential backoff
202 Accepted: job queued; poll GET /jobs/{id} for status
408 Request Timeout: job still processing; continue pollingEmail Metadata (Bigtable / Cassandra)
Row key: user_id#reverse_timestamp#email_id (reverse timestamp: newest emails first) Columns: subject, from, to[], cc[], bcc[], snippet, thread_id, labels[], is_read, is_starred, has_attachment, body_blob_ref, attachment_refs[]
Conversation Threading
Thread ID = hash of normalized subject + participants. OR use In-Reply-To and References headers (RFC 2822). Thread view: fetch all emails with same thread_id, sort by date.
Email Delivery Guarantees
SMTP RFC 5321: once a server ACKs receipt, it MUST deliver or bounce. Accepted ? persisted to durable queue (Kafka, RF=3) before ACK. If downstream fails ? retry with exponential backoff (up to 72h). Never silently drop an email.
Data Loss Prevention
- Blob Store: 3× replication across AZs (11-nines durability)
- Metadata: Bigtable/Cassandra RF=3, quorum writes
- Search index: rebuilt from metadata + blob store (source of truth)
- Backup: incremental snapshots to cold storage (weekly)
Email Deduplication
Content-addressable storage (dedup):
1. Compute SHA-256 of attachment
2. Check blob store: exists(sha256) → return existing blob_id
3. If not exists → upload new blob, store with sha256 as key
4. Email metadata references blob_id
Result: 50K emails referencing same PDF → 1 S3 object (10 MB)
Dedup ratio for corporate email: typically 60-80%
Challenges:
- Reference counting: INCR/DECR blob_refs:{sha256}
- Cross-tenant privacy: encrypt each blob with tenant-specific key
- Corruption: verify SHA-256 on read, auto-heal from replicaInterview Walkthrough
- Explain the federated SMTP model: anyone can run an MTA — your system is one node in a global store-and-forward network, not a closed platform.
- Separate ingest (MTA receives → spam filter → blob store + metadata) from delivery (MDA pushes to user mailbox) — different scaling profiles.
- Store email bodies in blob storage; metadata (headers, labels, thread ID) in a wide-column store optimized for user-scoped queries.
- Spam filtering is critical infrastructure — 50%+ of all email is spam; multi-layer scoring (DNSBL, Bayesian, ML) runs before inbox delivery.
- Attachment dedup via content hash — identical files across millions of emails stored once, saving 30–50% blob storage.
- Search index rebuilt from metadata + blob pointers — the blob store is source of truth, not the index.
- Common pitfall: storing full MIME bodies in the search database — bloated indexes and slow queries; bodies belong in object storage.
Storage Architecture: Bigtable vs Cassandra vs Sharded MySQL
Gmail stores 100B+ emails for 1B+ users.
Option 1: Bigtable (Google's actual choice) ⭐
Row key: {user_id}#{reverse_timestamp}#{email_id}
✓ Single-key prefix scan gives inbox in order
✓ Horizontally scalable to exabytes
✓ No sorting needed
✗ No secondary indexes → search requires Elasticsearch sidecar
Option 2: Cassandra
Partition key: user_id, Clustering: received_at DESC, email_id
✓ Handles mailbox queries well
✗ 2GB partition limit → need bucket partitioning by (user_id, year_month)
Option 3: Sharded MySQL (Hotmail/Outlook approach)
✓ Familiar, ACID
✗ Cross-shard queries expensive
✗ Schema migrations painful
Decision: Cassandra for mailbox + Elasticsearch for searchWhy Email Architecture Is Unique
- Federated protocol: anyone can run an SMTP server (unlike WhatsApp)
- Store-and-forward: emails may bounce through multiple MTAs
- Spam: 50%+ of all email is spam → filtering is critical infrastructure
- Standards: 40+ years of RFCs → backward compatibility constraints
- No real-time: unlike chat, email is async (seconds to hours delivery)
- Delivery guarantee: RFC mandates no silent drops → bounce on failure
Staff interviews expect you to articulate how the system evolves under real growth — not jump straight to the final architecture.
Phase 1: MVP (0 to 100K users)
Monolith or minimal services proving core email service gmail flows. Optimize for shipping speed and correctness over scale.
Key components: Single region · Primary DB + Redis cache · Synchronous core path · Basic monitoring
Move to next phase when: p99 latency exceeds SLO or DB CPU sustained above 70%
Phase 2: Growth (100K to 10M users)
Split read/write paths, introduce async processing for non-critical work, add caching layers and horizontal scaling.
Key components: Read replicas or CQRS · Message queue for async work · CDN / edge caching · Service-level SLOs
Move to next phase when: Hot keys, fan-out bottlenecks, or ops toil from manual scaling
Phase 3: Scale (10M+ users)
Shard data plane, multi-region active-active or active-passive, formal DR runbooks, cost optimization.
Key components: Database sharding / partitioning · Multi-region replication · Auto-scaling + chaos testing · Dedicated platform/SRE ownership
Move to next phase when: Regional failure domain risk, compliance data residency, or linear cost growth unsustainable
SLOs & Error Budgets
| Metric | Target | Rationale |
|---|---|---|
| Core user-facing availability | 99.95% | Budget for planned maintenance + unplanned failures without user-visible outage. |
| p99 latency (critical path) | Problem-specific — state target early and tie to capacity math | Interview credibility comes from connecting SLO to architecture choices. |
| Error rate (5xx) | < 0.1% | Distinguishes transient blips from systemic failure requiring rollback. |
| Data durability | 99.999999999% (11 nines) for committed writes | Define which operations require fsync/quorum vs async replication. |
Incident Scenarios (2am reality)
| Scenario | How you detect | Mitigation |
|---|---|---|
| Primary database unavailable | Health check failures, connection pool exhaustion alerts, elevated 5xx | Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists |
| Traffic spike (10× normal) | RPS anomaly alert, autoscaling lag, latency SLO burn rate | Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations |
| Bad deploy causing elevated errors | Canary metric regression, error budget burn, deployment correlation | Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility |
Cost Drivers (Staff lens)
- Egress bandwidth and CDN (often dominates media/data-heavy systems)
- Database storage + IOPS at scale (plan compaction, TTL, tiering)
- Compute for async pipelines (right-size workers, spot instances for batch)
- Managed service premiums vs operational headcount trade-off
Multi-Region & DR
Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.