Replication Strategies in Distributed Systems

Replication is the practice of storing multiple copies of data across different nodes in a distributed system to improve availability, durability, and performance. The choice of replication strategy fundamentally shapes system behavior and trade-offs.

Why Replicate?

Fault tolerance: If one node fails, replicas on other nodes ensure data remains accessible. With N=3 replication, the system tolerates up to 2 node failures.

Improved read performance: Reads can be served from any replica, distributing load and reducing latency by reading from geographically closer nodes.

Durability: Multiple copies protect against data loss from disk failures, corruption, or disasters.

Availability: Eventual consistency systems remain available for writes even when some replicas are unreachable.

N-Way Replication

The replication factor N defines how many copies of each data item exist in the system.

Common values:

N=1: No replication (single point of failure)
N=3: Standard production setting (balance of durability and cost)
N=5+: High-value data requiring extreme durability

The choice of N involves trade-offs:

Higher N → Better fault tolerance, more storage cost
Lower N → Less storage cost, reduced fault tolerance

graph TB
    subgraph "N=3 Replication"
        K[Key 'user-123']
        K --> R1[Replica 1<br/>Node A]
        K --> R2[Replica 2<br/>Node B]
        K --> R3[Replica 3<br/>Node C]
    end

    F[Node B Fails] -.->|System still available| R1
    F -.-> R3

Preference Lists

In systems using consistent hashing, the preference list defines which nodes store replicas of a given key.

Construction algorithm:

Hash the key to find its position on the ring
Identify the coordinator node (first node clockwise from key position)
Walk clockwise to find N-1 successor nodes
Ensure preference list contains N distinct physical nodes (skip virtual nodes from the same physical server)

Example preference list for key K with N=3:

Key K → hash: 95°
Preference list: [Node B (coordinator), Node C, Node D]

All N nodes in the preference list are equally responsible for the key. Any can coordinate read/write operations.

Cross-Datacenter Replication

For systems requiring resilience to datacenter failures, preference lists can span multiple geographic locations:

Strategy 1: Rack-aware placement

Ensure replicas span multiple racks within a datacenter
Protects against rack-level failures (power, network switch)

Strategy 2: Multi-datacenter replication

Distribute N replicas across different datacenters
Example: N=3 with replicas in us-east-1, us-west-2, eu-west-1
Survives entire datacenter outages

Trade-off: Cross-datacenter replication increases write latency (network round-trips) but dramatically improves disaster recovery capabilities.

graph LR
    subgraph Datacenter 1
        R1[Replica 1]
    end

    subgraph Datacenter 2
        R2[Replica 2]
    end

    subgraph Datacenter 3
        R3[Replica 3]
    end

    W[Write Operation] --> R1
    W --> R2
    W --> R3

    DC1F[DC 1 Fails] -.->|System survives| R2
    DC1F -.-> R3

Replication Models

Primary-Backup (Master-Slave)

One replica designated as primary receives all writes. Primary propagates changes to backups.

Advantages:

Strong consistency (single source of truth)
Simple conflict resolution (primary decides)

Disadvantages:

Write availability limited by primary availability
Primary is bottleneck for write throughput
Failover complexity when primary fails

Eliminating Bottlenecks Creates Conflicts

Multi-master replication trades the primary’s bottleneck for conflict resolution complexity—you still pay, just in a different currency.

Multi-Master (Peer-to-Peer)

All replicas accept writes. Systems like Dynamo use this model.

Advantages:

High write availability (any replica can accept writes)
Geographic distribution (writes go to nearest replica)
No single point of failure

Disadvantages:

Conflict resolution strategies needed for concurrent updates
More complex to reason about consistency
Requires Vector Clocks or similar mechanisms to track causality

graph TD
    subgraph "Primary-Backup"
        P[Primary] -->|Replicates| B1[Backup 1]
        P --> B2[Backup 2]
        W1[Write] --> P
        R1[Read] --> B1
    end

    subgraph "Multi-Master"
        M1[Master 1] <-->|Sync| M2[Master 2]
        M2 <-->|Sync| M3[Master 3]
        M3 <-->|Sync| M1
        W2[Write A] --> M1
        W3[Write B] --> M2
    end

Synchronous vs Asynchronous Replication

Synchronous replication: Write completes only after all N replicas acknowledge

Advantages: Strong consistency, guaranteed durability Disadvantages: High latency, reduced availability (all replicas must be reachable)

Asynchronous replication: Write completes after subset of replicas acknowledge

Advantages: Low latency, high availability Disadvantages: Potential data loss, eventual consistency

Hybrid approach (quorum systems): Require acknowledgment from W replicas (where W < N)

Balance between synchronous and asynchronous
Tunable via W parameter

Coordinator Selection

In Dynamo, any node can coordinate requests. Two approaches:

Smarter Clients, Simpler Systems

Pushing coordination logic into clients eliminates an entire network hop—complexity migrates from infrastructure to application code.

Client-driven coordination: Client library routes requests directly to appropriate nodes

Advantages: Eliminates extra hop, lower latency
Disadvantages: Clients must maintain membership information

Server-driven coordination: Load balancer forwards requests to random node, which acts as coordinator

Advantages: Simpler clients
Disadvantages: Extra network hop, single point of failure in load balancer

Dynamo’s evolution showed client-driven coordination reduced 99.9th percentile latencies by over 30ms.

Replica Synchronization

Replicas can diverge due to failures, network partitions, or concurrent writes. Multiple techniques ensure convergence:

Read repair: During reads, if replicas return different versions, coordinator updates stale replicas

Passive synchronization through normal traffic
Only repairs accessed data

Anti-entropy protocols: Background processes actively compare and synchronize replicas

Proactive, catches data that’s never read
Uses Merkle trees for efficiency

Hinted handoff: Temporary replicas ensure writes succeed even when primary replicas are down

Maintains write availability
Ensures data eventually reaches correct nodes

Replication in Practice: Dynamo

Dynamo demonstrates sophisticated replication:

Configuration: Typically N=3, replicas on preference list Model: Multi-master, any replica accepts writes Synchronization: Asynchronous with quorum systems (W=2) Cross-datacenter: Preference lists can span multiple availability zones Reconciliation: Vector Clocks track versions, application performs conflict resolution

Production experience:

99.94% of operations see single version (conflicts rare)
System survived datacenter failures without data loss
Tunable parameters enable application-specific trade-offs

Key Insight

There is no single “best” replication strategy—only trade-offs aligned with specific requirements:

Need strong consistency? → Primary-backup with synchronous replication
Need high availability? → Multi-master with asynchronous replication and eventual consistency
Need tunable guarantees? → Quorum systems with configurable N/R/W

The art of system design lies in understanding these trade-offs and choosing the replication strategy that matches your availability, consistency, and performance requirements. Dynamo’s success demonstrates that carefully orchestrated eventual consistency can provide both high availability and acceptable consistency for many real-world applications.

Gradual Notes

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Reentrant Code

Zeigarnik Effect

Migrating from Resque to Sidekiq

Replication Strategies in Distributed Systems

Why Replicate?

N-Way Replication

Preference Lists

Cross-Datacenter Replication

Replication Models

Primary-Backup (Master-Slave)

Multi-Master (Peer-to-Peer)

Synchronous vs Asynchronous Replication

Coordinator Selection

Replica Synchronization

Replication in Practice: Dynamo

Key Insight

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Reentrant Code

Zeigarnik Effect

Migrating from Resque to Sidekiq

Kubernetes Batch Jobs

Graph View

Table of Contents

Backlinks