Parallelism Through Waiting
Thread-based Concurrency paradoxically achieves massive throughput despite the Global Interpreter Lock by exploiting the gap between computation and I/O—threads multiply performance not by running simultaneously, but by yielding during the wait.
Sidekiq’s thread-based concurrency works within Ruby’s GIL by targeting I/O-bound workloads. While only one thread executes Ruby code at a time, threads can run concurrently during I/O operations like HTTP requests or database queries. This architectural choice enables processing thousands of jobs per second on a single process where process-based alternatives would require gigabytes of memory.
How Threads Exploit the GIL
The GIL (also called GVL in CRuby) prevents true parallel execution of Ruby code, but releases during I/O operations. When a thread makes a database query or HTTP request, it releases the GIL, allowing other threads to execute Ruby code. This creates “virtual parallelism” for I/O-bound workloads.
graph LR A[GVL] -->|acquired by| B[Thread 1: executing Ruby] A -.->|waiting| C[Thread 2: in I/O] A -.->|waiting| D[Thread 3: in I/O] C -->|I/O completes| A D -->|I/O completes| A
A thread executing a network request might spend 50ms waiting. During that wait, 4 other threads could each execute 12.5ms of Ruby code. The GIL serializes Ruby execution but doesn’t serialize waiting. This is why Sidekiq’s default concurrency of 10 threads can process 100 I/O-bound jobs simultaneously—9 threads wait while 1 executes.
Concurrency Tuning
The default concurrency is 10 threads, tuned for typical web application workloads. Higher concurrency helps for I/O-heavy jobs but can cause CPU saturation. Sidekiq’s benchmark shows 30 threads processing 23,500 jobs/second for pure Redis operations, but real-world applications rarely benefit from more than 10-15 threads due to GIL contention.
The optimal thread count depends on job characteristics:
- Pure I/O (HTTP API calls, S3 uploads): 20-30 threads
- Mixed I/O and CPU (image processing with external storage): 10-15 threads
- CPU-heavy (data transformation, JSON parsing): 5-10 threads
- Pure CPU (in-memory calculations): Better served by multiple processes
Setting concurrency too high creates GVL queuing—threads spend time waiting to acquire the GVL rather than doing useful work.
Thread-Local State Management
Each thread maintains minimal state—just the current job context. Sidekiq uses thread-local storage (Thread.current[:sidekiq_capsule]
) to route Redis connections to the correct pool without passing context objects through every method call.
# Thread-local routing avoids passing context everywhere
class Processor
def work(job)
# Implicit routing via Thread.current
redis_pool = Thread.current[:sidekiq_capsule].redis_pool
redis_pool.with { |conn| conn.del(job['jid']) }
end
end
This pattern trades explicitness for convenience—the capsule context is available anywhere without parameter threading. However, it creates implicit coupling that can complicate testing and debugging.
Memory Efficiency
Thread-based concurrency provides order-of-magnitude memory savings compared to process-based alternatives. Each Ruby process requires 50-100MB base memory plus loaded code. With 10 threads, one Sidekiq process handles 10 jobs concurrently in ~125MB. Process-based workers would require 10 processes = 1GB+ for the same throughput.
The memory advantage compounds with scale. Running 100 concurrent jobs requires:
- Sidekiq: 10 processes × 125MB = 1.25GB
- Resque: 100 processes × 75MB = 7.5GB
This 6x difference explains why Sidekiq dominates high-throughput scenarios where memory is constrained.
Future: Ractors and True Parallelism
Ruby 3.0+ introduces Ractors, which enable true parallel execution by giving each Ractor its own GIL. However, Ractors have significant constraints: no shared mutable state, message passing only, and limited compatibility with gems that use C extensions.
Sidekiq could theoretically use Ractors to achieve true Parallelism for CPU-bound jobs, but the ecosystem maturity and complexity aren’t justified when the thread model works so effectively for I/O-bound workloads. Process-based parallelism remains simpler for CPU-heavy work.
See Sidekiq Architecture for overview of how this concurrency model fits into the larger system.