Garbage collection (GC) automates memory management by reclaiming memory occupied by objects no longer in use. Different algorithms trade off between throughput, pause times, memory overhead, and complexity—fundamental tradeoffs in system design.
Mark and Sweep
How It Works
- Mark Phase: Traverse all reachable objects from root references (stack, globals), marking them as alive
- Sweep Phase: Scan entire heap, reclaiming memory from unmarked (dead) objects
Characteristics
- Advantages:
- Simple conceptually
- Handles cyclic references
- No object movement required
- Disadvantages:
- Stop-the-world pauses during collection
- Heap fragmentation over time
- Sweep phase must scan entire heap
- Poor cache locality—systems benefit from understanding how hardware actually works
Variations
- Tri-color marking: Incremental marking using white (unvisited), gray (visited but children not scanned), black (fully processed). See Ruby’s incremental GC implementation for detailed exploration.
- Lazy sweeping: Spread sweep work over time
Immix
How It Works
- Hybrid approach combining mark-region collection with opportunistic copying
- Divides heap into blocks (~32KB), further divided into lines (~128 bytes)
- Marks at line granularity during collection
- Recycles blocks with no live data; evacuates sparse blocks above threshold
Characteristics
- Advantages:
- Excellent defragmentation without full compaction cost
- Good memory utilization
- Fast allocation via bump-pointer in blocks
- Reduces fragmentation while keeping some mark-sweep benefits
- Lower pause times than full compaction
- Disadvantages:
- More complex than pure mark-sweep
- Some fragmentation remains (at line granularity)
- Requires object size tracking
Key Innovation
Opportunistic evacuation: only moves objects when beneficial, avoiding unnecessary copying overhead
NoGC (Explicit Management)
How It Works
- Manual memory management: programmer explicitly allocates and deallocates
- Examples:
malloc
/free
in C, RAII in C++, arena/region-based allocation
Characteristics
- Advantages:
- Deterministic performance (no GC pauses)
- Lower memory overhead (no GC metadata)
- Precise control over memory lifetime
- Predictable for real-time systems
- Disadvantages:
- Memory leaks if deallocation forgotten
- Use-after-free bugs
- Double-free errors
- Significant programmer burden
- Fragmentation without compaction
Modern Approaches
- Ownership systems (Rust): Compile-time guarantees prevent leaks and use-after-free
- Arena allocation: Bulk deallocate groups of objects
- Reference counting: Automatic but not true GC (struggles with cycles)
Comparison Summary
Algorithm | Pause Time | Throughput | Fragmentation | Complexity |
---|---|---|---|---|
Mark-Sweep | High | Medium | High | Low |
Immix | Medium | High | Low | Medium |
NoGC | None | Highest | Varies | N/A (manual) |
Key Tradeoffs
- Throughput vs Latency: GC algorithms can optimize for total program speed (throughput) or minimize pause times (latency). This mirrors Amdahl’s Law—the serial portions (GC pauses) fundamentally limit parallel speedup.
- Space vs Time: Compacting/copying GCs use more memory but improve locality and allocation speed
- Simplicity vs Performance: More sophisticated algorithms (like Immix) achieve better performance but increase implementation complexity. Similar to choosing between simplicity and optimization in software design.