What:
Kafka is a distributed, partitioned, replicated, append-only event log.
Primary purpose:
High-throughput, durable asynchronous event streaming and ingestion.
Usually used for:
Event-driven pipelines, real-time analytics, decoupled microservices, and log aggregation.
How should I think about this inside system architectures?
⚡ Durable Event Pipeline
Unlike transient queues that delete items on consume, Kafka is a permanent record of truth on disk.
📥 Pull-Based Consumption
Consumers poll the broker for data at their own pace, naturally resolving the backpressure problem.
📊 Partitioned Ordered Log
Concurrency is scaled by breaking topics into partitions. Ordering is strictly guaranteed only within a partition.
🛡️ Shock-Absorbing Async Layer
Acts as a massive buffer that isolates fast upstream systems from slower database updates downstream.
Needed When:
You require high-volume data ingestion, async event decoupling, or replayable streams.
Avoids:
Tight synchronous coupling, system cascading outages, and downstream write exhaustion.
Optimizes For:
Write scalability, event ordering per entity, high-throughput durability, and reliable retries.
Here is where Kafka sits in the system, how data gets routed, and how consumers coordinate:
The Partition Model
Each topic consists of one or more partitions distributed across the cluster. Partitions allow Kafka to scale horizontally:
- Partition-Level Ordering: Messages are ordered strictly within a single partition, not globally across the topic.
- High Throughput via Sequential Writes: Kafka writes to disk sequentially (O(1) append speed), leveraging disk cache.
- Zero-Copy Message Transfer: Data is copied directly from disk to network sockets without crossing into user-space memory.
- Durable & Replayable Consumers: Messages remain in the log even after consumption, allowing consumers to replay history.
- Decoupled Pull Protocol: Consumers request batches when they are ready, preventing memory exhaustion.
| Benefit | Cost |
|---|---|
| High Throughput (millions of events/sec via sequential I/O and zero-copy) | Operational Complexity (managing partitions, ZooKeeper/KRaft metadata, and client configs) |
| Durable Retries & Message Replay (allows historic data replay and easy recovery) | Eventual Consistency (readers might experience slight lag or delay across partitions) |
| Asynchronous Decoupling (isolates producers from consumer downtime/spikes) | Harder Debugging (tracing requests across async boundaries is complex) |
Problem: When key-based partitioning (e.g. partition by country code) sends a massive percentage of traffic to a single partition (like key: 'US'), overloading that broker while others sit idle.
Mitigation: Introduce composite partition keys (e.g. user_id + '_' + event_type) or add random salt suffixes to distribute heavy workloads evenly.
Problem: Downstream consumers process events slower than producers are publishing them, causing the consumer to fall further behind (lag grows). This can exhaust storage or delay business workflows.
Mitigation: Add more partitions to the topic and increase consumer instances within the Consumer Group (up to the partition limit) or optimize consumer processing with batching/multi-threading.
Problem: When a consumer joins or leaves the group (or crashes/fails to heartbeat), Kafka stops processing all partitions to reassign them among surviving group members, inducing periodic latency spikes.
Mitigation: Tune heartbeat intervals (session.timeout.ms) and use Static Group Membership (Kafka 2.3+) to prevent rebalances during short networking blips or rolling deployments.
| Problem | Usage |
|---|---|
| WhatsApp Message Queueing | Broker-style message delivery queue |
| Real-time Activity Feeds (e.g. LinkedIn, Twitter) | Fanout event pipelines to processing workers |
| Uber Proximity / Dispatch matching | Geo-partitioned streaming pipeline to match riders and drivers |
| Application Metrics & Clickstream (Netflix, YouTube) | High-volume telemetry ingestion and log aggregation |
| Database Audit Logging / Change Data Capture (CDC) | Compacted state capture events streamed to search engines (Elasticsearch) |
- You need async, fire-and-forget workflows with bulletproof durability.
- You must replay and re-process historic messages (e.g. rebuilding read-model caches).
- You require event ordering strictly per entity (e.g. tracking banking transaction ledger events).
- You face high-volume streaming ingest (clickstreams, metrics, IoT sensors).
- Multiple independent microservices need to consume the exact same stream of events.
- Replication & Quorums (how leaders replicate writes to followers safely)
- Event Sourcing & CQRS (using an append-only log to build projection engines)
- Stream Processing (real-time stream manipulations via Apache Flink / Spark)
- Change Data Capture (CDC) (turning DB updates into clean event streams)
- Write-Ahead Log (WAL) (durable sequential logging used in databases)
ISR & Durability Mathematics
Kafka achieves partition reliability via replication. The leader replica handles all client reads and writes. Followers that are caught up with the leader form the ISR (In-Sync Replicas) set:
To balance write speed against durability risk, tune the producer's acks configuration:
acks=0: Producer fire-and-forget. Zero durability guarantee.acks=1: Returns success the instant the leader broker writes it locally. Risk: leader dies before replication.acks=all: Leader writes locally and waits for all active ISR followers to sync. Full durability, higher write latency.
Log Compaction Internals
Normally, Kafka purges logs by age (TTL) or log size limit. But for key-value streams (like database records captured via CDC), we only care about the latest state of a key. Log Compaction periodically garbage-collects old keys, keeping only the final updated version per key:
Zero-Copy Reads & OS Page Cache
Traditional brokers read file contents into kernel buffer, copy them to user-space application memory, copy them back to kernel socket buffer, and then to network card. Kafka bypasses this entirely using the sendfile system call. The OS transfers data directly from the kernel page cache to the network NIC, eliminating CPU overhead and context switches.
ZooKeeper vs KRaft Consensus
Historically, Kafka relied on ZooKeeper to manage cluster states, partition assignments, and broker membership. Under KRaft (modern Kafka), consensus is integrated directly into Kafka brokers using a customized Raft-based metadata log. This eliminates the external coordination dependency, allows instant cluster boots, and supports millions of partitions.
Retention vs Compaction: Know Your Stream Type
Kafka offers two log lifecycle modes — choosing wrong breaks downstream consumers:
- Time/size retention (delete): Old segments drop after N days or N GB. Correct for clickstreams, metrics, and audit logs where history beyond the window is worthless.
- Log compaction: Keeps the latest record per key forever; tombstones with null values delete keys. Correct for CDC changelog topics where consumers rebuild current state from the log.
Event sourcing pitfall: Never enable compaction on an append-only event-sourcing topic where every state transition must be preserved — compaction garbage-collects intermediate events and destroys the audit trail. Pair event sourcing with infinite retention (delete policy disabled) or external snapshot + archive, not key compaction.