What:
A cryptographic binary tree where every leaf node is the hash of a data block, and every parent node is the hash of its children concatenated.
Primary purpose:
Efficiently verifying integrity and validating differences between large datasets distributed across multiple servers.
Usually used for:
Cassandra anti-entropy replica sync, BitTorrent block verification, Git commit tracking, and Blockchain ledger structures.
How should I think about this inside system architectures?
👑 Root Hash Identity
The Root Hash represents the absolute cryptographic fingerprint of the entire dataset. If a single byte changes, the Root Hash changes completely.
🌳 Logarithmic Synchronization
Compare trees from top to bottom. Sibling branches that match are skipped instantly. Traverse only differing paths to pinpoint out-of-sync blocks.
🛡️ Lightweight Verification
Prove block membership (Merkle Proof) to a client by sending only $O(\log N)$ sibling hashes along the path to the root, saving bandwidth.