Garbage Collection Algorithms

Garbage collection (GC) automates memory management by reclaiming memory occupied by objects no longer in use. Different algorithms trade off between throughput, pause times, memory overhead, and complexity—fundamental tradeoffs in system design.

Mark and Sweep

How It Works

Mark Phase: Traverse all reachable objects from root references (stack, globals), marking them as alive
Sweep Phase: Scan entire heap, reclaiming memory from unmarked (dead) objects

Characteristics

Advantages:
- Simple conceptually
- Handles cyclic references
- No object movement required
Disadvantages:
- Stop-the-world pauses during collection
- Heap fragmentation over time
- Sweep phase must scan entire heap
- Poor cache locality—systems benefit from understanding how hardware actually works

Variations

Tri-color marking: Incremental marking using white (unvisited), gray (visited but children not scanned), black (fully processed). See Ruby’s incremental GC implementation for detailed exploration.
Lazy sweeping: Spread sweep work over time

Immix

How It Works

Hybrid approach combining mark-region collection with opportunistic copying
Divides heap into blocks (~32KB), further divided into lines (~128 bytes)
Marks at line granularity during collection
Recycles blocks with no live data; evacuates sparse blocks above threshold

Characteristics

Advantages:
- Excellent defragmentation without full compaction cost
- Good memory utilization
- Fast allocation via bump-pointer in blocks
- Reduces fragmentation while keeping some mark-sweep benefits
- Lower pause times than full compaction
Disadvantages:
- More complex than pure mark-sweep
- Some fragmentation remains (at line granularity)
- Requires object size tracking

Key Innovation

Opportunistic evacuation: only moves objects when beneficial, avoiding unnecessary copying overhead

NoGC (Explicit Management)

How It Works

Manual memory management: programmer explicitly allocates and deallocates
Examples: malloc/free in C, RAII in C++, arena/region-based allocation

Characteristics

Advantages:
- Deterministic performance (no GC pauses)
- Lower memory overhead (no GC metadata)
- Precise control over memory lifetime
- Predictable for real-time systems
Disadvantages:
- Memory leaks if deallocation forgotten
- Use-after-free bugs
- Double-free errors
- Significant programmer burden
- Fragmentation without compaction

Modern Approaches

Ownership systems (Rust): Compile-time guarantees prevent leaks and use-after-free
Arena allocation: Bulk deallocate groups of objects
Reference counting: Automatic but not true GC (struggles with cycles)

Comparison Summary

Algorithm	Pause Time	Throughput	Fragmentation	Complexity
Mark-Sweep	High	Medium	High	Low
Immix	Medium	High	Low	Medium
NoGC	None	Highest	Varies	N/A (manual)

Key Tradeoffs

Throughput vs Latency: GC algorithms can optimize for total program speed (throughput) or minimize pause times (latency). This mirrors Amdahl’s Law—the serial portions (GC pauses) fundamentally limit parallel speedup.
Space vs Time: Compacting/copying GCs use more memory but improve locality and allocation speed
Simplicity vs Performance: More sophisticated algorithms (like Immix) achieve better performance but increase implementation complexity. Similar to choosing between simplicity and optimization in software design.

Gradual Notes

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Reentrant Code

Zeigarnik Effect

Migrating from Resque to Sidekiq

Garbage Collection Algorithms

Mark and Sweep

How It Works

Characteristics

Variations

Immix

How It Works

Characteristics

Key Innovation

NoGC (Explicit Management)

How It Works

Characteristics

Modern Approaches

Comparison Summary

Key Tradeoffs

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Reentrant Code

Zeigarnik Effect

Migrating from Resque to Sidekiq

Kubernetes Batch Jobs

Graph View

Table of Contents

Backlinks