Core Concept

Stream Processing Basics

Stream processing extracts real-time insights from continuous message flows, using bounded time windows and watermarks to reconcile out-of-order logs safely.


What:

Real-time processing engines (Apache Flink, Spark Streaming) that aggregate and transform infinite message streams continuously.

Primary purpose:

Extracting analytical insights, metric counters, or fraud signals from event logs instantly as they occur.

Usually used for:

Real-time GPS coordinate ETAs, live payment fraud alerts, and sliding-window system health trackers.

How should I think about this inside system architectures?

🕰️ Event vs Processing Time

Event Time is when the update occurred on the client. Processing Time is when it reaches the server. Always process using Event Time.

🌊 Watermark advancement

A temporal anchor signal: 'We assume all events with Event Time < T have arrived.' Advance watermarks to close windows and emit states.

🗄️ Stateful RocksDB Cache

Store running aggregate counters (e.g. click counts) inside local RocksDB key-value stores to enable micro-second local state updates.