Instruction Sequence

An instruction sequence (ISEQ) is the core data structure in YARV that represents compiled Ruby code. Every method, block, and class body is compiled into its own ISEQ, serving as the blueprint for execution. The ISEQ is what the VM actually executes, not the Ruby source code.

What is an Instruction Sequence?

An instruction sequence is the compiled form of Ruby code:

# Ruby source:
def add(a, b)
  a + b
end
 
# Compiled to instruction sequence:
# == disasm: #<ISeq:add>
# 0000 getlocal a              ( 2)[Li]
# 0002 getlocal b
# 0004 opt_plus <calldata>
# 0006 leave

The bytecode compilation process transforms Ruby’s abstract syntax tree into this sequential instruction format that the YARV VM can execute efficiently.

Structure Components

An ISEQ contains everything the VM needs to execute a piece of Ruby code:

ISEQ Components:
┌──────────────────────────────────────┐
│ Bytecode Instructions                │
│ ┌──────────────────────────────────┐ │
│ │ [putself, opt_send_without_block,│ │
│ │  leave]                          │ │
│ └──────────────────────────────────┘ │
├──────────────────────────────────────┤
│ Call Counter (for JIT)               │
│ ┌──────────────────────────────────┐ │
│ │ calls: 25                        │ │
│ └──────────────────────────────────┘ │
├──────────────────────────────────────┤
│ JIT Entry Point                      │
│ ┌──────────────────────────────────┐ │
│ │ NULL or 0x7f8e2c004000           │ │
│ └──────────────────────────────────┘ │
├──────────────────────────────────────┤
│ Metadata                             │
│ - Local variable table               │
│ - Argument information               │
│ - Source location info               │
│ - Catch table (exception handling)   │
│ - Constant pool                      │
└──────────────────────────────────────┘

Bytecode Instructions

The heart of an ISEQ is its array of YARV stack instructions. These instructions represent the compiled Ruby code with operands embedded inline:

# Ruby:
x = 42
 
# Instruction sequence:
# 0000 putobject 42        ← '42' is an operand
# 0002 setlocal x

The operand 42 is known at compile-time and built into the ISEQ. At runtime, the VM reads it from the ISEQ. This differs from stack values, which are computed at runtime.

Local Variable Table

The ISEQ maintains a local variable table mapping variable names to stack positions:

def calculate(x, y)
  sum = x + y
  product = x * y
  sum + product
end
 
# Local table:
# 0: x        (parameter)
# 1: y        (parameter)
# 2: sum      (local variable)
# 3: product  (local variable)

This table enables the VM to:

Access variables by offset (fast)
Allocate the right amount of stack space
Provide debugging information

Argument Information

The ISEQ stores detailed argument metadata for method dispatch and validation:

def method(req, opt = 1, *rest, key:, key_opt: 2, **kwrest, &block)
  # ...
end
 
# Argument info:
# - Required positional: 1 (req)
# - Optional positional: 1 (opt)
# - Rest parameter: yes (*rest)
# - Required keyword: 1 (key:)
# - Optional keyword: 1 (key_opt:)
# - Keyword rest: yes (**kwrest)
# - Block parameter: yes (&block)

This information guides argument validation at call sites, stack frame setup, and method dispatch optimization.

Constant Pool

The constant pool stores compile-time known values:

def example
  puts "Hello", 42, :symbol
end
 
# Constant pool:
# - "Hello" (string literal)
# - 42 (integer literal)
# - :symbol (symbol literal)
# - :puts (method name)
#
# Instructions reference pool by index:
# putstring @0   ← "Hello"
# putobject @1   ← 42
# putobject @2   ← :symbol

This makes bytecode compact - values are stored once and referenced by index. Nested ISEQs are also stored in the parent’s constant pool.

Source Location Information

For debugging and error reporting, ISEQs maintain source location data:

# Associates each instruction with source location
{
  instruction_index: 0,
  source_file: "example.rb",
  line_number: 10,
  column: 5
}

This enables meaningful stack traces, debugger breakpoints, coverage analysis, and TracePoint events.

Catch Table (Exception Handling)

The catch table maps instruction ranges to exception handlers:

def risky
  dangerous_operation
rescue StandardError => e
  handle_error(e)
ensure
  cleanup
end
 
# Catch table:
# [
#   { type: :rescue, range: 0..5, target: 6 },
#   { type: :ensure, range: 0..8, target: 9 }
# ]

When an exception occurs, the VM checks the current instruction index against the catch table, finds the matching handler, and jumps to the handler’s target instruction.

JIT Integration

The ISEQ serves as the coordination point between interpretation and compilation.

Call Counter: The JIT Trigger

The call counter tracks how many times this ISEQ has been executed. This simple integer drives the JIT compilation decision:

struct rb_iseq_constant_body {
    // ... other fields
    unsigned int call_counter;
    // ...
};

The YJIT execution mechanics use this counter to implement the two-phase compilation strategy:

25 calls: Begin profiling
30 calls: Compile to native code

Each time the ISEQ executes, the VM increments this counter and checks if compilation should trigger.

JIT Entry Point: The Execution Switch

The jit_entry field is a function pointer that’s either NULL (not compiled) or points to native machine code:

struct rb_iseq_constant_body {
    // ... other fields
    void *jit_entry;  // NULL or native code address
    // ...
};

This single field transforms how Ruby executes code:

Execution Decision:
┌─────────────────┐
│ iseq->jit_entry │
└────────┬────────┘
         │
    ┌────┴─────┐
    │          │
  NULL?    Address?
    │          │
    ▼          ▼
┌────────┐  ┌──────────────┐
│Interpret│  │Jump to native│
│bytecode │  │machine code  │
└────────┘  └──────────────┘

The YJIT execution mechanics leverage this pointer to seamlessly switch between interpreted and compiled execution.

ISEQ as JIT Bridge

The ISEQ serves as the coordination point between interpretation and compilation:

ISEQ as JIT Bridge:
┌─────────────────────┐
│ Ruby Source Code    │
└──────────┬──────────┘
           ▼
    ┌──────────────┐
    │   Parse      │
    └──────┬───────┘
           ▼
    ┌──────────────┐
    │   ISEQ       │ ◄─── Central structure
    └──────┬───────┘
           │
     ┌─────┴──────┐
     ▼            ▼
┌─────────┐  ┌──────────┐
│Interpreter│  │   YJIT   │
│reads      │  │generates │
│bytecode   │  │native    │
│           │  │code      │
└─────────┘  └─────┬────┘
                   │
                   ▼
            ┌──────────────┐
            │jit_entry ptr │
            │updated       │
            └──────────────┘

The ISEQ provides:

Bytecode for interpretation
Profiling data for optimization decisions
Storage for the jit_entry pointer
Metadata for code generation

ISEQ Lifecycle

An ISEQ progresses through several states:

Creation: Compiled from Ruby source during parse/load
Interpretation: Executed by VM, call counter increments
Profiling: YJIT observes types and patterns (25+ calls)
Compilation: Native code generated, jit_entry populated (30+ calls)
Execution: Direct jump to native code
De-optimization: Native code invalidated, jit_entry cleared
Garbage Collection: ISEQ freed when no longer referenced

The ISEQ persists across this lifecycle, acting as the stable reference point as execution strategies change.

Compilation Process

Bytecode compilation transforms Ruby AST to instruction sequences:

graph LR
    A[Ruby Source] --> B[Parser]
    B --> C[AST]
    C --> D[Compiler]
    D --> E[Instruction Sequence]
    E --> F[YARV VM]

    style A fill:#e1f5ff
    style C fill:#fff4e1
    style E fill:#e8f5e9
    style F fill:#fce4ec

Each node in the AST potentially generates a new ISEQ:

Method definitions → method iseq
Block expressions → block iseq
Class/module bodies → class iseq

YARV caches instruction sequences to avoid recompilation:

# First time this code executes:
eval("1 + 2")  # Parse → Compile → Execute
 
# Second time (same string):
eval("1 + 2")  # Use cached iseq → Execute

This caching is why require is fast for already-loaded files.

Hierarchy and Nesting

Ruby programs form a tree of instruction sequences mirroring the code structure:

class MyClass                    # Root ISEQ (class body)
  def method1                    # Child ISEQ (method)
    [1, 2].map do |x|            # Grandchild ISEQ (block)
      x * 2
    end
  end
 
  def method2                    # Child ISEQ (method)
    if condition
      # code here               # Child ISEQ (block)
    end
  end
end

Each ISEQ:

Has a parent ISEQ reference
Contains child ISEQs for nested code (stored in constant pool)
Maintains its own execution context

This hierarchy enables lexical scope resolution, closure variable capture, and proper frame management.

The tree structure looks like:

<main>
  └── <class:MyClass>
      ├── method1
      │   └── block in method1
      └── method2
          └── block (0) in method2

The Program Counter and ISEQs

A frame executes one instruction sequence at a time, with the program counter pointing to the current position:

def example
  a = 1      # ← PC at instruction 0
  b = 2      # ← PC at instruction 2
  a + b      # ← PC at instruction 4
end
 
# Frame for 'example':
#   iseq: <ISeq:example>
#   PC: 4  (currently executing 'add')

When a call site invokes a method, a new frame is created with a different ISEQ and its own PC.

Performance Characteristics

Understanding ISEQ structure clarifies performance characteristics:

Compact Representation

ISEQs are designed for efficient memory usage:

Memory Layout:
┌──────────────────────────┐
│ ISEQ Header (~200 bytes) │ ← Frequently accessed
├──────────────────────────┤
│ Instruction Array        │ ← Hot path during execution
├──────────────────────────┤
│ Operand Pool             │ ← Referenced by instructions
├──────────────────────────┤
│ Metadata (cold data)     │ ← Rarely accessed during execution
└──────────────────────────┘

This layout reflects mechanical sympathy:

Hot data (instructions, operands) packed together for cache locality
Cold data (debug info, source locations) separated
Minimal indirection for common operations
Variable-width encoding
Shared constant pools
Operands embedded directly

Execution Efficiency

Sequential access: The program counter increments predictably, making execution cache-friendly.

Instruction count matters: Each instruction has overhead, even if small. Fewer instructions = faster execution.

JIT threshold is per-ISEQ: Each method/block has its own counter, so small frequently-called methods JIT compile quickly.

Metadata is cheap: Source location and debug info don’t impact hot path performance - they’re only accessed on errors/debugging.

ISEQ size affects memory: Large methods create large ISEQs. Consider breaking up massive methods.

Viewing and Inspecting ISEQs

Ruby provides tools to inspect ISEQs:

# Using RubyVM::InstructionSequence
code = "1 + 2"
iseq = RubyVM::InstructionSequence.compile(code)
puts iseq.disasm
 
# Output:
# == disasm: #<ISeq:<compiled>>
# 0000 putobject 1
# 0002 putobject 2
# 0004 opt_plus <calldata>
# 0006 leave
 
# Using --dump=insns flag
ruby --dump=insns -e '1 + 2'

This is essential for understanding how YARV executes your code.

Key Insights

Compiled Form: ISEQs are Ruby’s compiled bytecode representation
Hierarchical: Nested scopes create nested ISEQs stored in constant pools
Self-Contained: Each ISEQ includes instructions, constants, metadata, and JIT integration
Execution Unit: Frames execute one ISEQ at a time via the program counter
JIT Coordination: The ISEQ bridges interpretation and compilation via call counter and jit_entry pointer
Optimization Target: JIT and other optimizations work on ISEQs
Inspectable: Ruby provides tools to examine ISEQs
Cached: ISEQs are compiled once and reused
Performance-Oriented: Memory layout optimized for hot/cold data separation

Understanding instruction sequences is key to grasping how YARV translates Ruby source code into executable bytecode, manages program execution, and coordinates the transition from interpretation to JIT-compiled code.

Recent Writing

Recent Notes