Core Concept

Replication, Failover, & Leader Election

Replication scales distributed read capacities and protects against host crashes, but introduces consistency lags and complex failover coordination challenges.


What:

Replication clones database states to backup servers. Failover promotes a replica when primary fails, coordinated by leader election protocols (Raft/Paxos).

Primary purpose:

Ensuring high availability, surviving hardware crashes, and scaling read query throughput globally.

Usually used for:

Primary-replica database clusters, stateful messaging pools, and decentralized consensus networks.

How should I think about this inside system architectures?

🚦 Replication Latency Bounds

Balance consistency: pick Synchronous for financials; choose Asynchronous for read-heavy social profiles.

🗳️ Majority Quorum (N/2 + 1)

Leader election require consensus agreement from over 50% of the cluster nodes to prevent partitioned updates.

🔄 Heartbeat Monitors

Deploy strict periodic heartbeat signals to monitor master health. Trigger failover sweeps when nodes drop heartbeats.