Replication, Failover, & Leader Election – System Design Core Concept

What:

Replication clones database states to backup servers. Failover promotes a replica when primary fails, coordinated by leader election protocols (Raft/Paxos).

Primary purpose:

Ensuring high availability, surviving hardware crashes, and scaling read query throughput globally.

Usually used for:

Primary-replica database clusters, stateful messaging pools, and decentralized consensus networks.

How should I think about this inside system architectures?

🚦 Replication Latency Bounds

Balance consistency: pick Synchronous for financials; choose Asynchronous for read-heavy social profiles.

🗳️ Majority Quorum (N/2 + 1)

Leader election require consensus agreement from over 50% of the cluster nodes to prevent partitioned updates.

🔄 Heartbeat Monitors

Deploy strict periodic heartbeat signals to monitor master health. Trigger failover sweeps when nodes drop heartbeats.

Replication Topology Classifications: Structuring client writes sync:

Type Model	Behavioral Mechanic	Durability Benefit	Latency Cost
Synchronous	Primary blocks write acknowledgments until secondaries write to disk.	Zero data loss window during primary crashes.	Write latencies increase; if a secondary node fails, all writes block.
Asynchronous	Primary acknowledges write instantly; syncs secondaries in background.	Ultra-low write latency; highly available.	Data loss window if primary crashes before secondaries sync.
Semi-Synchronous	Primary blocks until at least one secondary acknowledges the write.	Protects against single node failures without blocking on all secondaries.	Moderate latency penalty.

Election Consensus Algorithms: Coordinating master promotions:

Consensus Algorithm	Liveness Heartbeat	Safety Validation
Raft Consensus	Strict leader heartbeats prevent follower election timeouts.	Followers only vote for candidates with logs at least as up-to-date as theirs.
Bully Algorithm	Active queries trigger ping sweeps to discover neighbor online statuses.	Active node with the highest configured numeric ID bullies others to claim leadership.

Benefit	Cost
Synchronous Data Safety (guarantees secondaries store identical blocks before client return, ensuring zero lost updates)	Write Availability Melt (if secondary networks degrade or experience packet losses, the entire write path blocks)
Asynchronous Write Performance (acknowledges client writes instantly, keeping request loops fast and available)	Replication Lag & Stale Reads (secondaries lag behind master, resulting in temporary read staleness across regions)