What:
The set of conventions that define how callers interact with your system beyond transport choice — request/response shapes, failure semantics, auth boundaries, and evolution rules.
Primary purpose:
Make integrations predictable for humans and machines: clients know how to retry, paginate, authenticate, and handle partial failures without tribal knowledge.
Usually used for:
Public REST/GraphQL products, B2B partner APIs, payment and booking flows, async job platforms, and webhook-driven event notifications.
Separate three layers in every API discussion:
📡 Transport
HTTP/2, gRPC, WebSocket — how bytes move. Covered in Network Protocols.
📜 Contract
Resources, fields, error codes, pagination tokens, idempotency rules — what callers may depend on.
🔌 Integration
Webhooks, batch imports, OAuth scopes, SLA headers — how external systems stay in sync over time.
Needed When:
Multiple clients (web, mobile, partners) consume the same backend, or money-moving operations require exactly-once business semantics.
Avoids:
Ambiguous 500 errors, duplicate charges on retry, and breaking mobile apps silently when response fields disappear.
Optimizes For:
Operability — on-call engineers and partner developers can debug from logs and documented error codes alone.
Public clients hit a gateway that enforces auth, rate limits, and contract validation. Internal services speak stricter schemas; integration events exit via signed webhooks:
- Structured errors — machine-readable codes plus human messages; never expose stack traces publicly.
- Safe retries — idempotency keys on POST; GET/PUT/DELETE documented as naturally idempotent where applicable.
- Stable pagination — cursor tokens over deep offset scans; document sort order guarantees.
- Explicit auth scopes — least-privilege tokens; separate read vs write vs admin capabilities.
- Versioned evolution — deprecation headers, sunset dates, parallel /v1 and /v2 routes during migration.
| Benefit | Cost |
|---|---|
| Explicit contracts — clients and services agree on error shapes, pagination tokens, and webhook signatures before code ships | Documentation debt — contracts drift unless OpenAPI/schema checks run in CI |
| Integration resilience — idempotency keys and signed webhooks survive network retries without duplicate side effects | Storage overhead — idempotency records and webhook delivery logs consume Redis/DB capacity |
Problem: A mobile client retries a failed payment POST three times. Without an idempotency key, the gateway creates three charges.
Mitigation: Require Idempotency-Key on mutating endpoints; store key → response mapping in Redis with 24-hour TTL; return cached response on duplicate key.
Problem: Partner endpoint is down for 10 minutes; order-shipped events are lost with fire-and-forget POST.
Mitigation: Persist outbound events in a queue; exponential backoff retries; expose GET /events for manual replay; sign payloads with HMAC so partners verify authenticity.
Problem: Admin dashboard requests page 5000 with offset=100000; database scans and discards 100k rows per request.
Mitigation: Cursor pagination keyed on indexed columns; document that arbitrary page jumps are unsupported for large datasets.
| Problem | Usage |
|---|---|
| Payment checkout API | Idempotency-Key header, structured 402/409 errors, webhook on settlement |
| Partner B2B integration | OAuth2 client-credentials scopes, rate-limit response headers, versioned base path |
| Video transcoding job | 202 Accepted + job_id, GET /jobs/{id} polling or callback URL on completion |
| Bulk user import | Batch POST with per-row error array; partial success without failing entire upload |
- Money or inventory moves — idempotency and conflict codes are non-negotiable.
- Third parties integrate — publish OpenAPI, webhook schemas, and sandbox environments.
- Jobs exceed 2 seconds — return 202 + job resource instead of blocking HTTP connection.
- Lists exceed 10k rows — cursor pagination and optional export-to-file async endpoints.
- Breaking changes ship — version bump, deprecation header, minimum 90-day sunset for public APIs.
- Network Protocols (HTTP, gRPC, WebSocket, DNS) — transport and wire-format choices beneath the contract.
- Proxy & Reverse Proxy Patterns — gateway terminates TLS, routes /v1 vs /v2, enforces rate limits.
- Redis Patterns for Interview Systems — idempotency key storage and rate-limit counters.
- Circuit Breaker, Retries & Bulkheads — client-side retry policy aligned with server error semantics.
- Distributed Transactions: 2PC vs Saga — multi-step flows that surface as composite API operations.
Structured Error Envelopes
Avoid returning plain text or generic {"error": "something went wrong"}. Production APIs expose a stable machine code, a safe human message, optional field-level validation details, and a correlation ID for support:
{
"error": {
"code": "INSUFFICIENT_FUNDS",
"message": "Account balance too low for this transfer.",
"request_id": "req_8f3a2b",
"details": [{ "field": "amount", "issue": "exceeds_available_balance" }]
}
}Map HTTP status to retry semantics explicitly in docs — 4xx generally non-retryable except 429; 5xx and 503 retryable with backoff. Interviewers notice when you distinguish 401 (who are you?) from 403 (I know you, but you cannot do this).
Long-Running Operations: 202 Accepted Pattern
When work exceeds HTTP timeout budgets (video transcode, report generation), accept the request synchronously and process asynchronously:
- POST /exports → 202 Accepted with
Location: /jobs/abc123and body{"status":"queued"}. - Client polls GET /jobs/abc123 until status is
completedorfailed. - Alternatively, client supplies
callback_url; server POSTs signed webhook on terminal state.
Never hold a TCP connection open for minutes — load balancers and mobile networks will kill it.
Batch APIs with Partial Success
Importing 500 users in one request should not fail entirely because row 237 has an invalid email. Return 207 Multi-Status or 200 with a per-item result array:
{
"succeeded": 498,
"failed": 2,
"results": [
{ "index": 236, "status": "ok", "id": "usr_991" },
{ "index": 237, "status": "error", "code": "INVALID_EMAIL" }
]
}Cap batch size server-side (e.g., max 100 items) and document throughput limits so clients chunk large imports themselves.
Webhook Security & Delivery Guarantees
- HMAC signature: Include
X-Signature: sha256=...over raw body; partners verify with shared secret. - Idempotent delivery: Each event carries unique
event_id; receivers dedupe in a 72-hour window. - At-least-once delivery: Retry with backoff; document that partners must handle duplicates.
- Replay endpoint: GET /webhooks/events?since=timestamp for partners to backfill after outage.
Rate Limit Transparency
When returning 429, include headers so clients self-throttle without guessing:
X-RateLimit-Limit: 1000 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1717084800 Retry-After: 42
Differentiate per-API-key quotas from per-IP throttling — B2B partners expect contractual limits, not surprise blocks.
HTTP Status Quick Reference
| Status | When | Client Action |
|---|---|---|
| 400 Bad Request | Client sent malformed JSON or violated field constraints | Fix payload; do not retry blindly |
| 401 / 403 | Missing token vs authenticated but forbidden | Refresh credentials or request elevated scope |
| 409 Conflict | Optimistic lock version mismatch or duplicate resource | Fetch latest state and merge or surface to user |
| 429 Too Many Requests | Rate or quota limit exceeded | Honor Retry-After header; exponential backoff |
| 503 Service Unavailable | Overload or dependency outage; may be transient | Retry with jitter; circuit-break after N failures |