YARV (Yet Another Ruby VM)

source

YARV (Yet Another Ruby VM) is Ruby’s stack-based virtual machine that was introduced in Ruby 1.9 (2007) to replace the original MRI (Matz’s Ruby Interpreter). YARV compiles Ruby source code into bytecode instructions that are then executed by the VM.

Architecture Overview

YARV is fundamentally a stack-based virtual machine, meaning it stores and manipulates values using a stack data structure rather than registers. This design choice makes it easier to implement and JIT compile compared to register-based VMs.

graph TD
    A[Ruby Source Code] --> B[Parser]
    B --> C[Abstract Syntax Tree]
    C --> D[Instruction Sequence Compiler]
    D --> E[YARV Bytecode]
    E --> F[YARV VM Executor]
    F --> G[Stack Operations]
    G --> H[Result]

Core Concepts

The Stack

The stack data structure in YARV is a last-in, first-out (LIFO) data structure that persists throughout the program’s lifecycle. Operations in YARV work by:

Pushing values onto the stack
Popping values from the stack
Manipulating values at the top of the stack

Stack visualization:
┌─────────────┐
│   (empty)   │  ← Stack top
├─────────────┤
│   (empty)   │
├─────────────┤
│   (empty)   │
└─────────────┘

After putnil:
┌─────────────┐
│     nil     │  ← Stack top
├─────────────┤
│   (empty)   │
├─────────────┤
│   (empty)   │
└─────────────┘

Stack Contract

Methods and blocks in YARV have a contract: they must leave the stack in its original state (relative to their execution context). This means:

A method can push values during execution
But must pop all temporary values before returning
Only the return value should remain on the stack

Basic Instructions

YARV provides several categories of stack manipulation instructions. The most fundamental are the “push” instructions that place values onto the stack.

Push Instructions

Instruction	Purpose	Example
putnil	Pushes nil onto stack	`nil`
putobject	Pushes compile-time objects	`true`, `42`, `:symbol`
putstring	Pushes unfrozen strings	`"hello"`
duparray	Duplicates and pushes arrays	`[1, 2, 3]`
duphash	Duplicates and pushes hashes	`{a: 1}`

See YARV stack instructions for detailed exploration of each instruction type.

Viewing YARV Bytecode

You can inspect the YARV instructions generated from Ruby code using the --dump=insns flag:

ruby --dump=insns -e 'nil'

This outputs the instruction sequence that YARV will execute, allowing you to understand how Ruby code translates to VM operations.

Key Characteristics

Advantages of Stack-Based Design:

Simpler implementation compared to register-based VMs
Easier to JIT compile - see YJIT execution mechanics
Compact bytecode representation
Natural fit for expression evaluation

Trade-offs:

More instructions needed for complex operations
Stack manipulation overhead
Less direct mapping to hardware registers

The stack-based architecture makes YARV particularly amenable to speculative optimization - the JIT compiler can observe stack operations during profiling and generate optimized native code based on type assumptions.

Memory Management

YARV’s instruction sequences integrate with Ruby’s garbage collection system. Compile-time known objects referenced by instructions like putobject are managed by the garbage collector to ensure they persist as long as needed.

Further Exploration

YARV contains many more features beyond basic stack operations:

Instruction Types

YARV stack manipulation instructions - Reordering, duplicating, and removing values
Arithmetic and logical operations
Method calls and returns
Control flow (jumps, branches)
Variable access (local, instance, global)
Object creation and manipulation

Execution Context

YARV frame types - Different frame types for methods, blocks, classes, and exceptions
frame parent-child relationships - How frames share or isolate variable access
frame - Detailed frame structure and lifecycle

Observability

YARV events - Event system for debugging, profiling, and code analysis
TracePoint API - Ruby’s interface to YARV events

Understanding YARV’s instruction set, frame system, and event model provides deep insight into Ruby’s execution model and performance characteristics.

Glossary

Call data: Information about a specific call site in Ruby that is retained by instructions that perform method calls. This metadata includes the method name, argument count, and flags that guide method dispatch.

Call site: A location in the source code where a method is called. Each call site has associated call data used for optimization like inline caching. See call site for details on how YARV optimizes these locations.

Callee: The method that is being called by another method (the caller). The callee executes in a new frame and receives the receiver as self. See callee.

Caller: The method that is calling another method (the callee). The caller’s frame is saved during the call and restored when the callee returns. See caller.

Compile-time: The time when the Ruby program is being compiled into bytecode from source. Operands and instruction sequences are determined at compile-time.

CRuby: The main Ruby implementation at ruby/ruby that is written in C. CRuby uses YARV as its virtual machine since Ruby 1.9.

Environment pointer: A pointer held by a frame that points to the bottom of the stack region used by that frame. The EP enables the VM to access local variables by offset and maintain stack discipline.

Frame: A data structure that holds the execution state of the virtual machine at a given point in time. Each frame contains the program counter, stack pointer, environment pointer, and local variables. See frame.

Instruction: A single operation that the virtual machine can perform. Often abbreviated as insn. Instructions may have operands that specify what they operate on. See YARV stack instructions.

Instruction sequence: A list of instructions that the virtual machine executes. Often abbreviated as iseq. Each method, block, and class body compiles to its own instruction sequence. Contains call counter and JIT entry point. See instruction sequence.

Instruction set: The complete set of instructions that the virtual machine can perform. YARV’s instruction set includes stack operations, arithmetic, control flow, and method calls.

Operand: A value that is used by an instruction. Operands are known at compile-time and are built into the instruction sequences, unlike stack values which are computed at runtime. See operand.

Program counter: A pointer to the current instruction in an instruction sequence. Also called an instruction pointer. Each frame has its own program counter that advances as instructions execute. See program counter.

Receiver: The object that a method is being called on. The receiver becomes self inside the method and is stored in the callee’s frame. The receiver’s class determines method dispatch. See receiver.

Stack: A stack data structure that holds values being used by the virtual machine. Also called the value stack or operand stack. YARV is a stack-based virtual machine where operations push and pop values from this stack.

Stack pointer: A pointer held by a frame that points to the next available slot in the stack. Often abbreviated as sp. The stack pointer moves up on push operations and down on pop operations, maintaining stack discipline.

Tracepoint: A publication/subscription system for virtual machine events. Tracepoints use the program counter to trigger callbacks at specific execution points, enabling debuggers and profilers.

Virtual machine: A piece of software that emulates a computer, executing bytecode instead of machine code. See virtual machine architecture.

YARV: The virtual machine used by CRuby. Stands for “Yet Another Ruby Virtual Machine”. YARV is a stack-based virtual machine that executes instruction sequences compiled from Ruby source.

Gradual Notes

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Reentrant Code

Zeigarnik Effect

Migrating from Resque to Sidekiq