What:
An index is a secondary database access path built on top of primary tables.
Primary purpose:
Accelerating read query speeds by replacing slow full table disk scans with binary tree traversals.
Usually used for:
Filtering rows (WHERE), sorting datasets (ORDER BY), joining tables, and resolving point lookups.
How should I think about this inside system architectures?
🔍 Secondary Access Path
Instead of reading millions of database rows sequentially from disk, follow a tree branch directly to the row pointer in O(log N) steps.
⚖️ Write & Memory Tax
Indexes are not free. Every added index slows database mutations because the engine must update both the raw table and the index tree.
⛓️ Left-to-Right Sort Order
Composite (multi-column) indexes are sorted hierarchically. They only support queries matching from left to right (leftmost prefix rule).
Needed When:
Query latencies spike, reads heavily outnumber writes, or database queries scan more than a small fraction of rows.
Avoids:
Full table scans, database disk I/O exhaustion, high query response variance, and CPU starvation.
Optimizes For:
Time to First Byte (TTFB), index-only reads (covering index), and stable response times under heavy concurrent loads.
The default indexing structure for standard SQL and NoSQL databases is the B+Tree. It places navigational keys in internal branches, and links leaf nodes sequentially to support quick range scans:
Index Classification Matrix
| Type | Lookup | Range | Use Case |
|---|---|---|---|
| B+Tree (General purpose) | O(log N) | Excellent (linked leaves) | Standard SQL/NoSQL primary/secondary indexes |
| Hash (Point lookup) | O(1) | None (unordered keys) | Redis, exact match filters in SQL databases |
| Covering (Index-only) | O(log N) | Excellent | Queries reading ONLY the indexed columns (zero disk row seek) |
| Inverted (Full-text) | Token-based | N/A | Search engines like Elasticsearch, PostgreSQL GIN/GiST |
| Geospatial (R-Tree / S2) | O(log N) | Proximity | Location queries near coords (PostGIS, Uber, MongoDB) |
- Clustered Index: Organizes the raw table rows on disk physically sorted by the primary key (maximum one per table).
- Non-Clustered Index: A separate tree structures storing secondary keys pointing to primary key addresses.
- Leftmost Prefix Rule: A composite index on
(A, B, C)only works if your query includes columnA. - Covering Index: An index containing 100% of the columns requested in the
SELECTclause, skipping main table access entirely. - Selectivity Ratio: The ratio of unique values to total rows. High selectivity (like email) makes B+Tree indexes highly efficient.
| Benefit | Cost |
|---|---|
| Drastically Faster Reads (swaps slow full table scans for O(log N) lookups) | Slower Writes (every INSERT/UPDATE/DELETE must update all secondary indexes) |
| Efficient Range Queries & ORDER BY (linked leaf nodes keep data physically sorted) | Memory & Disk Overhead (large indexes consume RAM buffer space and storage) |
| Covering Index Optimization (serves select queries entirely from index metadata) | Maintenance Overhead (indexes require periodic defragmentation/rebuilding) |
Problem: A developer builds a composite index on (first_name, last_name) but queries with WHERE last_name = 'Smith'. The database optimizer ignores the index and falls back to a slow full table scan.
Mitigation: Ensure query patterns align from left to right with indexed columns, or add separate indexes tailored to individual filters.
Problem: Standard pagination query LIMIT 20 OFFSET 50000 gets progressively slower. The database must traverse and discard 50,000 index entries to read the target 20.
Mitigation: Switch to cursor-based pagination (keyset pagination): WHERE id > :last_seen_id ORDER BY id LIMIT 20 to jump directly to the starting offset.
Problem: Querying WHERE LOWER(email) = 'user@example.com' on an indexed email column bypasses the index entirely because database runtime calculations modify keys during search.
Mitigation: Query with exact casing matching the index, or construct a specialized functional index: CREATE INDEX ... ON users (LOWER(email)).
| Problem | Usage |
|---|---|
| WhatsApp Message Timeline | Composite index on (chat_thread_id, timestamp) for rapid chat histories |
| Uber Driver Dispatch Proximity | Geospatial index (S2 / Geohash cells) to find nearby drivers in milliseconds |
| E-commerce Faceted Search | Inverted index (Elasticsearch) to filter product catalogs by color, size, price |
| Instagram Feed Timeline Generation | Index on (user_id, created_at DESC) for cursor-based user home timelines |
| API Idempotency Key Validation | Unique B+Tree or Hash index on (idempotency_key) to intercept duplicate requests |
- Slow queries show high
rows_examinedin comparison torows_sent. - Your database reads far outnumber mutations (e.g. system read-heavy metadata lookup).
- A high volume of queries perform filters, aggregations, or strict sorting.
- You face relational joins (foreign keys must always be indexed to prevent slow join scans).
- You must guarantee domain integrity (e.g. enforcing email uniqueness).
- LSM-Tree vs B+Tree (comparing append-heavy storage vs read-optimized databases)
- Database Partitioning & Sharding (routing queries across multi-node databases)
- Query Execution Plans (compiling, optimizing, and execution maps of SQL engines)
- Full-Text Inverted Indexing (tokenization and search indexing internals)
- Geospatial S2 & H3 Hierarchies (geospatial coordinates representation)
Clustered vs Non-Clustered Storage Mechanics
In clustered indexes (e.g., MySQL's InnoDB primary key), leaf nodes store the actual physical data row on disk. In secondary non-clustered indexes, the leaf nodes store the secondary search key and the primary key as a row pointer. To retrieve non-indexed columns, the database must traverse the secondary index tree, find the primary key, and then traverse the primary clustered index tree (double lookup or key lookup).
Composite Index Rule Mathematics
When planning multi-column composite indexes, you must adhere to the Equality-first, Sort-next, Range-last optimization math:
-- For Query:
WHERE status = 'ACTIVE' AND user_id = 100 AND created_at > '2024-01-01'
ORDER BY created_at DESC;
-- Best Composite Index columns:
(user_id, status, created_at)- Columns filtered with exact match (
=) must come first. - Columns utilized in sorting (
ORDER BY) come second. - Columns queried with range comparisons (
>, <, BETWEEN) must come last. Range columns break B+Tree index traversal chain for subsequent columns.
Covering Indexes & Index-Only Scans
A covering index contains all columns requested in the query. For example, if you index (user_id, status) and query: SELECT status FROM users WHERE user_id = 10, the database reads the index leaf node directly and returns the response. It completely skips accessing raw table data on disk, reducing I/O operations to near zero.
Query Execution Analysis (EXPLAIN)
To verify database optimizer decisions, prefix queries with EXPLAIN or EXPLAIN ANALYZE. High-performing queries will list:
type: reforeq_ref(using index lookup).Extra: Using index(confirming covering index).- Low
rowsscanned ratio (ideally close to returned count). - Avoid
Extra: Using filesortorExtra: Using temporary(indicates slow non-indexed in-memory sorting).
LSM-Tree Write Path (Cassandra, RocksDB, LevelDB)
B+Tree indexes optimize for read latency — every INSERT updates random pages on disk, which hurts write-heavy workloads. Log-Structured Merge (LSM) trees flip the model: writes append sequentially to an in-memory memtable and a write-ahead log; when the memtable fills, it flushes to immutable SST files on disk. Reads check memtable → bloom filter → SST layers (newest to oldest).
- Write amplification: Background compaction merges SST files — disk I/O continues after writes "complete."
- Read amplification: A key may exist in multiple SST layers until compaction runs; bloom filters skip absent keys cheaply.
- Interview fit: Choose LSM-backed stores for append-heavy telemetry, messaging metadata, and write-heavy KV — choose B+Tree OLTP when complex indexed reads dominate.