Parallelism Through Waiting

Thread-based Concurrency paradoxically achieves massive throughput despite the Global Interpreter Lock by exploiting the gap between computation and I/O—threads multiply performance not by running simultaneously, but by yielding during the wait.

Sidekiq’s thread-based concurrency works within Ruby’s GIL by targeting I/O-bound workloads. While only one thread executes Ruby code at a time, threads can run concurrently during I/O operations like HTTP requests or database queries. This architectural choice enables processing thousands of jobs per second on a single process where process-based alternatives would require gigabytes of memory.

How Threads Exploit the GIL

The GIL (also called GVL in CRuby) prevents true parallel execution of Ruby code, but releases during I/O operations. When a thread makes a database query or HTTP request, it releases the GIL, allowing other threads to execute Ruby code. This creates “virtual parallelism” for I/O-bound workloads.

graph LR
    A[GVL] -->|acquired by| B[Thread 1: executing Ruby]
    A -.->|waiting| C[Thread 2: in I/O]
    A -.->|waiting| D[Thread 3: in I/O]
    C -->|I/O completes| A
    D -->|I/O completes| A

A thread executing a network request might spend 50ms waiting. During that wait, 4 other threads could each execute 12.5ms of Ruby code. The GIL serializes Ruby execution but doesn’t serialize waiting. This is why Sidekiq’s default concurrency of 10 threads can process 100 I/O-bound jobs simultaneously—9 threads wait while 1 executes.

Concurrency Tuning

The default concurrency is 10 threads, tuned for typical web application workloads. Higher concurrency helps for I/O-heavy jobs but can cause CPU saturation. Sidekiq’s benchmark shows 30 threads processing 23,500 jobs/second for pure Redis operations, but real-world applications rarely benefit from more than 10-15 threads due to GIL contention.

The optimal thread count depends on job characteristics:

  • Pure I/O (HTTP API calls, S3 uploads): 20-30 threads
  • Mixed I/O and CPU (image processing with external storage): 10-15 threads
  • CPU-heavy (data transformation, JSON parsing): 5-10 threads
  • Pure CPU (in-memory calculations): Better served by multiple processes

Setting concurrency too high creates GVL queuing—threads spend time waiting to acquire the GVL rather than doing useful work.

Thread-Local State Management

Each thread maintains minimal state—just the current job context. Sidekiq uses thread-local storage (Thread.current[:sidekiq_capsule]) to route Redis connections to the correct pool without passing context objects through every method call.

# Thread-local routing avoids passing context everywhere
class Processor
  def work(job)
    # Implicit routing via Thread.current
    redis_pool = Thread.current[:sidekiq_capsule].redis_pool
    redis_pool.with { |conn| conn.del(job['jid']) }
  end
end

This pattern trades explicitness for convenience—the capsule context is available anywhere without parameter threading. However, it creates implicit coupling that can complicate testing and debugging.

Memory Efficiency

Thread-based concurrency provides order-of-magnitude memory savings compared to process-based alternatives. Each Ruby process requires 50-100MB base memory plus loaded code. With 10 threads, one Sidekiq process handles 10 jobs concurrently in ~125MB. Process-based workers would require 10 processes = 1GB+ for the same throughput.

The memory advantage compounds with scale. Running 100 concurrent jobs requires:

  • Sidekiq: 10 processes × 125MB = 1.25GB
  • Resque: 100 processes × 75MB = 7.5GB

This 6x difference explains why Sidekiq dominates high-throughput scenarios where memory is constrained.

Future: Ractors and True Parallelism

Ruby 3.0+ introduces Ractors, which enable true parallel execution by giving each Ractor its own GIL. However, Ractors have significant constraints: no shared mutable state, message passing only, and limited compatibility with gems that use C extensions.

Sidekiq could theoretically use Ractors to achieve true Parallelism for CPU-bound jobs, but the ecosystem maturity and complexity aren’t justified when the thread model works so effectively for I/O-bound workloads. Process-based parallelism remains simpler for CPU-heavy work.

See Sidekiq Architecture for overview of how this concurrency model fits into the larger system.