Sidekiq Performance Analysis

Abstraction Tax at Scale

ActiveJob’s framework convenience costs exactly half your throughput—10,700 vs 21,300 jobs/sec—proving that architectural beauty and performance optimization often demand opposite trade-offs.

Sidekiq’s architecture delivers impressive performance through careful optimization of the hot path—the code executed for every single job. Understanding these performance characteristics helps make informed decisions about workload design and infrastructure sizing.

Throughput Benchmarks

Sidekiq’s bin/sidekiqload benchmark creates 500,000 no-op jobs and processes them as fast as possible, assuming 1ms Redis network latency. This is an I/O-bound benchmark measuring coordination overhead rather than job execution time.

Configuration	Throughput	Notes
Sidekiq 7.0 + YJIT + 30 threads	23,500 jobs/sec	Maximum observed throughput
Sidekiq 7.0 + Ruby 3.2 + 30 threads	21,300 jobs/sec	Without YJIT optimization
ActiveJob 7.0 + YJIT + 30 threads	14,700 jobs/sec	37% slower due to overhead
ActiveJob 7.0 + Ruby 3.2 + 30 threads	10,700 jobs/sec	50% slower than native API

Key insight: Most of Sidekiq’s overhead is Redis network I/O, not Ruby execution. The 30-thread concurrency was determined experimentally to maximize one CPU without saturation. Real-world applications rarely need more than 10-15 threads due to GIL contention.

ActiveJob Overhead

ActiveJob adds ~50% overhead through argument deserialization and callback execution. For every job, ActiveJob:

Deserializes arguments from GlobalID format
Instantiates the job object
Runs before_perform and after_perform callbacks
Wraps execution in exception handlers
Serializes results back to the queue

# Native Sidekiq: ~21,300 jobs/sec
class FastJob
  include Sidekiq::Job
  def perform(user_id)
    # Direct execution
  end
end
 
# ActiveJob: ~10,700 jobs/sec
class SlowJob < ApplicationJob
  def perform(user)
    # Extra deserialization and callbacks
  end
end

For high-throughput systems (>10,000 jobs/sec), using Sidekiq’s native API provides 2x better performance. For typical workloads (<1,000 jobs/sec), ActiveJob’s convenience outweighs its overhead.

Memory Footprint

Sidekiq’s memory usage follows a predictable pattern:

Base process: 50-80MB (Ruby interpreter + Sidekiq code)
Per job class: 1-2MB (loaded code + dependencies)
Per thread: <1MB (thread stack + local variables)
Redis connections: Negligible (connection metadata only)

A typical process with 10 threads, 50 job classes, processes ~125MB RSS. Compare to process-based workers:

Resque: 75MB × 10 processes = 750MB for same concurrency
Delayed::Job: 100MB × 10 processes = 1GB for same concurrency

The 6-8x memory advantage compounds at scale. Running 100 concurrent jobs:

Sidekiq: 10 processes × 125MB = 1.25GB
Resque: 100 processes × 75MB = 7.5GB

Latency Characteristics

Immediate jobs (enqueued to default queue):

Enqueue latency: <1ms (single Redis LPUSH)
Fetch latency: <2ms (BRPOP blocks for 2sec max)
Total time-to-execution: <10ms in healthy systems

Scheduled jobs (enqueued for future execution):

Enqueue latency: <1ms (single Redis ZADD to sorted set)
Polling check: Every 5-15 seconds (scaled by cluster size)
Time precision: ±5-15 seconds depending on cluster

The Poller’s interval self-adjusts based on cluster size. With 30 Sidekiq processes, each polls every 450 seconds on average (30 × 15sec). This prevents thundering herd while ensuring timely execution.

Network Latency Impact

Redis network latency is the primary bottleneck. Sidekiq logs warnings if round-trip time exceeds 50ms:

WARN: Your Redis network RTT is 127ms. Move your Redis closer!

Each job requires minimum 2 Redis operations:

BRPOP: Fetch job from queue (~1 RTT)
Job execution: May require additional Redis ops
Cleanup: Heartbeat updates, metrics (amortized across jobs)

At 1ms RTT, a single thread can process ~500 jobs/sec (2ms per job). At 50ms RTT, throughput drops to ~10 jobs/sec. Co-locating Sidekiq and Redis in the same datacenter/availability zone is critical.

YJIT Performance Gains

Ruby 3.1+ with YJIT enabled provides 10-20% throughput improvement:

RUBY_YJIT_ENABLE=1 bundle exec sidekiq

YJIT (Yet Another Just-In-Time compiler) optimizes hot code paths through type specialization. For Sidekiq’s job processing loop, this translates to fewer CPU cycles per job.

However, YJIT increases memory usage by 15-30MB per process. For memory-constrained environments or low-throughput systems, the trade-off may not be worth it.

Concurrency Tuning

Optimal thread count depends on workload characteristics:

Workload Type	Recommended Threads	Reasoning
Pure I/O (HTTP APIs)	20-30	GIL released during I/O
Mixed I/O and CPU	10-15	Balance GIL contention
CPU-bound	5-10	GIL contention dominates
Database-heavy	10-15	Limited by DB conn pool

Setting concurrency too high creates GVL queuing—threads spend time waiting to acquire the GIL rather than doing work. Monitor CPU usage: sustained 100% with high wait times indicates GIL saturation.

Monitoring Critical Metrics

Queue latency: Time between enqueue and execution start

# Check via Web UI or API
queue = Sidekiq::Queue.new("default")
queue.latency  # Seconds until oldest job executes

High latency indicates insufficient processing capacity. Scale horizontally (more processes) or optimize job execution time.

Process busy threads: Percentage of threads actively executing jobs

# Healthy: 60-80% busy during peak
# Low (<30%): Work starvation, increase queues/jobs
# High (>90%): Potential GIL saturation

Redis round-trip time: Network latency between Sidekiq and Redis

# Logged in heartbeat, accessible via Web UI
# Target: <5ms same AZ, <50ms different AZ
# Warning: >50ms significantly impacts throughput

Optimization Strategies

Batch operations: Use push_bulk for enqueueing multiple jobs (reduces Redis RTT)
Pipeline Redis commands: Group multiple commands to save network round trips
Minimize middleware: Each middleware adds function call overhead
Use native API for hot paths: Skip ActiveJob overhead for high-throughput jobs
Keep jobs small: Break large jobs into smaller chunks for better distribution
Optimize serialization: Avoid large argument sizes, use references when possible

For systems pushing Sidekiq’s limits (>10,000 jobs/sec), profile the hot path and optimize accordingly. For typical systems, focus on horizontal scaling—add more processes rather than over-optimizing.

See Sidekiq Architecture for how these performance characteristics emerge from architectural decisions.

Gradual Notes

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Kubernetes Batch Jobs

Sidekiq Architecture

Sidekiq Capsules

Sidekiq Performance Analysis

Throughput Benchmarks

ActiveJob Overhead

Memory Footprint

Latency Characteristics

Network Latency Impact

YJIT Performance Gains

Concurrency Tuning

Monitoring Critical Metrics

Optimization Strategies

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Kubernetes Batch Jobs

Sidekiq Architecture

Sidekiq Capsules

Sidekiq Concurrency Model

Graph View

Table of Contents

Backlinks