Core Concept

Probabilistic Data Structures

Space-efficient probabilistic structures — Bloom filters for membership tests, Count-Min Sketches for frequency estimation, and HyperLogLog for approximate cardinality — trade exactness for RAM and latency wins at scale.


What:

A highly space-efficient, probabilistic data structure consisting of a bit array initialized to 0, coupled with multiple independent hash functions.

Primary purpose:

Filtering out non-existent keys at the memory layer, preventing expensive database and disk I/O seek operations.

Usually used for:

LSM-Tree SSTable seek optimizations, web crawler URL deduplication, and database query DDoS shields.

How should I think about this inside system architectures?

🎯 Absolute Definite No

If any hash index bit evaluates to 0, the element is **definitively not** in the set. Zero false negatives guaranteed.

🎲 Probabilistic Maybe Yes

If all checked bits are 1, the element **might** be in the set. A slight chance of false positives exists due to hash index collisions.

🧹 Zero Bit Deletes

Because multiple elements share overlapping bits, you cannot 'delete' an item by setting its bits back to 0.