Sidekiq Concurrency Model

Parallelism Through Waiting

Thread-based Concurrency paradoxically achieves massive throughput despite the Global Interpreter Lock by exploiting the gap between computation and I/O—threads multiply performance not by running simultaneously, but by yielding during the wait.

Sidekiq’s thread-based concurrency works within Ruby’s GIL by targeting I/O-bound workloads. While only one thread executes Ruby code at a time, threads can run concurrently during I/O operations like HTTP requests or database queries. This architectural choice enables processing thousands of jobs per second on a single process where process-based alternatives would require gigabytes of memory.

How Threads Exploit the GIL

The GIL (also called GVL in CRuby) prevents true parallel execution of Ruby code, but releases during I/O operations. When a thread makes a database query or HTTP request, it releases the GIL, allowing other threads to execute Ruby code. This creates “virtual parallelism” for I/O-bound workloads.

graph LR
    A[GVL] -->|acquired by| B[Thread 1: executing Ruby]
    A -.->|waiting| C[Thread 2: in I/O]
    A -.->|waiting| D[Thread 3: in I/O]
    C -->|I/O completes| A
    D -->|I/O completes| A

A thread executing a network request might spend 50ms waiting. During that wait, 4 other threads could each execute 12.5ms of Ruby code. The GIL serializes Ruby execution but doesn’t serialize waiting. This is why Sidekiq’s default concurrency of 10 threads can process 100 I/O-bound jobs simultaneously—9 threads wait while 1 executes.

Concurrency Tuning

The default concurrency is 10 threads, tuned for typical web application workloads. Higher concurrency helps for I/O-heavy jobs but can cause CPU saturation. Sidekiq’s benchmark shows 30 threads processing 23,500 jobs/second for pure Redis operations, but real-world applications rarely benefit from more than 10-15 threads due to GIL contention.

The optimal thread count depends on job characteristics:

Pure I/O (HTTP API calls, S3 uploads): 20-30 threads
Mixed I/O and CPU (image processing with external storage): 10-15 threads
CPU-heavy (data transformation, JSON parsing): 5-10 threads
Pure CPU (in-memory calculations): Better served by multiple processes

Setting concurrency too high creates GVL queuing—threads spend time waiting to acquire the GVL rather than doing useful work.

Thread-Local State Management

Each thread maintains minimal state—just the current job context. Sidekiq uses thread-local storage (Thread.current[:sidekiq_capsule]) to route Redis connections to the correct pool without passing context objects through every method call.

# Thread-local routing avoids passing context everywhere
class Processor
  def work(job)
    # Implicit routing via Thread.current
    redis_pool = Thread.current[:sidekiq_capsule].redis_pool
    redis_pool.with { |conn| conn.del(job['jid']) }
  end
end

This pattern trades explicitness for convenience—the capsule context is available anywhere without parameter threading. However, it creates implicit coupling that can complicate testing and debugging.

Memory Efficiency

Thread-based concurrency provides order-of-magnitude memory savings compared to process-based alternatives. Each Ruby process requires 50-100MB base memory plus loaded code. With 10 threads, one Sidekiq process handles 10 jobs concurrently in ~125MB. Process-based workers would require 10 processes = 1GB+ for the same throughput.

The memory advantage compounds with scale. Running 100 concurrent jobs requires:

Sidekiq: 10 processes × 125MB = 1.25GB
Resque: 100 processes × 75MB = 7.5GB

This 6x difference explains why Sidekiq dominates high-throughput scenarios where memory is constrained.

Future: Ractors and True Parallelism

Ruby 3.0+ introduces Ractors, which enable true parallel execution by giving each Ractor its own GIL. However, Ractors have significant constraints: no shared mutable state, message passing only, and limited compatibility with gems that use C extensions.

Sidekiq could theoretically use Ractors to achieve true Parallelism for CPU-bound jobs, but the ecosystem maturity and complexity aren’t justified when the thread model works so effectively for I/O-bound workloads. Process-based parallelism remains simpler for CPU-heavy work.

See Sidekiq Architecture for overview of how this concurrency model fits into the larger system.

Gradual Notes

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Kubernetes Batch Jobs

Sidekiq Architecture

Sidekiq Capsules

Sidekiq Concurrency Model

How Threads Exploit the GIL

Concurrency Tuning

Thread-Local State Management

Memory Efficiency

Future: Ractors and True Parallelism

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Kubernetes Batch Jobs

Sidekiq Architecture

Sidekiq Capsules

Sidekiq Concurrency Model

Graph View

Table of Contents

Backlinks