Garbage collection (GC) automates memory management by reclaiming memory occupied by objects no longer in use. Different algorithms trade off between throughput, pause times, memory overhead, and complexity—fundamental tradeoffs in system design.

Mark and Sweep

How It Works

  1. Mark Phase: Traverse all reachable objects from root references (stack, globals), marking them as alive
  2. Sweep Phase: Scan entire heap, reclaiming memory from unmarked (dead) objects

Characteristics

  • Advantages:
    • Simple conceptually
    • Handles cyclic references
    • No object movement required
  • Disadvantages:
    • Stop-the-world pauses during collection
    • Heap fragmentation over time
    • Sweep phase must scan entire heap
    • Poor cache locality—systems benefit from understanding how hardware actually works

Variations

  • Tri-color marking: Incremental marking using white (unvisited), gray (visited but children not scanned), black (fully processed). See Ruby’s incremental GC implementation for detailed exploration.
  • Lazy sweeping: Spread sweep work over time

Immix

How It Works

  • Hybrid approach combining mark-region collection with opportunistic copying
  • Divides heap into blocks (~32KB), further divided into lines (~128 bytes)
  • Marks at line granularity during collection
  • Recycles blocks with no live data; evacuates sparse blocks above threshold

Characteristics

  • Advantages:
    • Excellent defragmentation without full compaction cost
    • Good memory utilization
    • Fast allocation via bump-pointer in blocks
    • Reduces fragmentation while keeping some mark-sweep benefits
    • Lower pause times than full compaction
  • Disadvantages:
    • More complex than pure mark-sweep
    • Some fragmentation remains (at line granularity)
    • Requires object size tracking

Key Innovation

Opportunistic evacuation: only moves objects when beneficial, avoiding unnecessary copying overhead

NoGC (Explicit Management)

How It Works

  • Manual memory management: programmer explicitly allocates and deallocates
  • Examples: malloc/free in C, RAII in C++, arena/region-based allocation

Characteristics

  • Advantages:
    • Deterministic performance (no GC pauses)
    • Lower memory overhead (no GC metadata)
    • Precise control over memory lifetime
    • Predictable for real-time systems
  • Disadvantages:
    • Memory leaks if deallocation forgotten
    • Use-after-free bugs
    • Double-free errors
    • Significant programmer burden
    • Fragmentation without compaction

Modern Approaches

  • Ownership systems (Rust): Compile-time guarantees prevent leaks and use-after-free
  • Arena allocation: Bulk deallocate groups of objects
  • Reference counting: Automatic but not true GC (struggles with cycles)

Comparison Summary

AlgorithmPause TimeThroughputFragmentationComplexity
Mark-SweepHighMediumHighLow
ImmixMediumHighLowMedium
NoGCNoneHighestVariesN/A (manual)

Key Tradeoffs

  • Throughput vs Latency: GC algorithms can optimize for total program speed (throughput) or minimize pause times (latency). This mirrors Amdahl’s Law—the serial portions (GC pauses) fundamentally limit parallel speedup.
  • Space vs Time: Compacting/copying GCs use more memory but improve locality and allocation speed
  • Simplicity vs Performance: More sophisticated algorithms (like Immix) achieve better performance but increase implementation complexity. Similar to choosing between simplicity and optimization in software design.