Distributed Systems

Distributed systems are computing systems where components run on multiple networked machines, coordinating their actions by passing messages. Unlike centralized systems, they have no shared memory or global clock, creating fundamental challenges in coordination, consistency, and failure handling.

Core Challenges

No Global Clock: Without synchronized time, determining event ordering requires mechanisms like Vector Clocks or logical clocks. You cannot reliably know if event A happened before event B across different nodes.

Network Partitions: Network failures can split the system into isolated groups. The CAP-theorem shows you must choose between consistency and availability during partitions.

Partial Failures: Some nodes may fail while others continue operating. Systems must detect failures and continue functioning, often using techniques like Gossip Protocols and Quorum Systems.

Coordination Costs: Achieving agreement (consensus) across nodes is expensive. Techniques like Consistent Hashing and Sloppy Quorums reduce coordination requirements.

Common Patterns

Distributed systems employ various strategies to manage complexity:

Replication: Store multiple copies for availability and durability
Eventual Consistency: Allow temporary divergence, reconcile later
Quorum Systems: Require majority agreement for operations
Anti-Entropy Protocols: Background repair processes using Merkle Trees
Decentralization: Eliminate single points of failure through symmetry

When to Use

Distributed systems add significant complexity and should be adopted when:

Scale exceeds single-machine capacity
Geographic distribution is required
High availability is critical (tolerating machine failures)
Horizontal scaling is more cost-effective than vertical scaling

Examples: Dynamo, Cassandra, distributed databases, cloud infrastructure.

Gradual Notes

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Kubernetes Batch Jobs

Sidekiq Architecture

Sidekiq Capsules