YARV (Yet Another Ruby VM) is Ruby’s stack-based virtual machine that was introduced in Ruby 1.9 (2007) to replace the original MRI (Matz’s Ruby Interpreter). YARV compiles Ruby source code into bytecode instructions that are then executed by the VM.
Architecture Overview
YARV is fundamentally a stack-based virtual machine, meaning it stores and manipulates values using a stack data structure rather than registers. This design choice makes it easier to implement and JIT compile compared to register-based VMs.
graph TD A[Ruby Source Code] --> B[Parser] B --> C[Abstract Syntax Tree] C --> D[Instruction Sequence Compiler] D --> E[YARV Bytecode] E --> F[YARV VM Executor] F --> G[Stack Operations] G --> H[Result]
Core Concepts
The Stack
The stack data structure in YARV is a last-in, first-out (LIFO) data structure that persists throughout the program’s lifecycle. Operations in YARV work by:
- Pushing values onto the stack
- Popping values from the stack
- Manipulating values at the top of the stack
Stack visualization:
┌─────────────┐
│ (empty) │ ← Stack top
├─────────────┤
│ (empty) │
├─────────────┤
│ (empty) │
└─────────────┘
After putnil:
┌─────────────┐
│ nil │ ← Stack top
├─────────────┤
│ (empty) │
├─────────────┤
│ (empty) │
└─────────────┘
Stack Contract
Methods and blocks in YARV have a contract: they must leave the stack in its original state (relative to their execution context). This means:
- A method can push values during execution
- But must pop all temporary values before returning
- Only the return value should remain on the stack
Basic Instructions
YARV provides several categories of stack manipulation instructions. The most fundamental are the “push” instructions that place values onto the stack.
Push Instructions
Instruction | Purpose | Example |
---|---|---|
putnil | Pushes nil onto stack | nil |
putobject | Pushes compile-time objects | true , 42 , :symbol |
putstring | Pushes unfrozen strings | "hello" |
duparray | Duplicates and pushes arrays | [1, 2, 3] |
duphash | Duplicates and pushes hashes | {a: 1} |
See YARV stack instructions for detailed exploration of each instruction type.
Viewing YARV Bytecode
You can inspect the YARV instructions generated from Ruby code using the --dump=insns
flag:
ruby --dump=insns -e 'nil'
This outputs the instruction sequence that YARV will execute, allowing you to understand how Ruby code translates to VM operations.
Key Characteristics
Advantages of Stack-Based Design:
- Simpler implementation compared to register-based VMs
- Easier to JIT compile - see YJIT execution mechanics
- Compact bytecode representation
- Natural fit for expression evaluation
Trade-offs:
- More instructions needed for complex operations
- Stack manipulation overhead
- Less direct mapping to hardware registers
The stack-based architecture makes YARV particularly amenable to speculative optimization - the JIT compiler can observe stack operations during profiling and generate optimized native code based on type assumptions.
Memory Management
YARV’s instruction sequences integrate with Ruby’s garbage collection system. Compile-time known objects referenced by instructions like putobject
are managed by the garbage collector to ensure they persist as long as needed.
Further Exploration
YARV contains many more features beyond basic stack operations:
Instruction Types
- YARV stack manipulation instructions - Reordering, duplicating, and removing values
- Arithmetic and logical operations
- Method calls and returns
- Control flow (jumps, branches)
- Variable access (local, instance, global)
- Object creation and manipulation
Execution Context
- YARV frame types - Different frame types for methods, blocks, classes, and exceptions
- frame parent-child relationships - How frames share or isolate variable access
- frame - Detailed frame structure and lifecycle
Observability
- YARV events - Event system for debugging, profiling, and code analysis
- TracePoint API - Ruby’s interface to YARV events
Understanding YARV’s instruction set, frame system, and event model provides deep insight into Ruby’s execution model and performance characteristics.
Glossary
Call data: Information about a specific call site in Ruby that is retained by instructions that perform method calls. This metadata includes the method name, argument count, and flags that guide method dispatch.
Call site: A location in the source code where a method is called. Each call site has associated call data used for optimization like inline caching. See call site for details on how YARV optimizes these locations.
Callee: The method that is being called by another method (the caller). The callee executes in a new frame and receives the receiver as self
. See callee.
Caller: The method that is calling another method (the callee). The caller’s frame is saved during the call and restored when the callee returns. See caller.
Compile-time: The time when the Ruby program is being compiled into bytecode from source. Operands and instruction sequences are determined at compile-time.
CRuby: The main Ruby implementation at ruby/ruby that is written in C. CRuby uses YARV as its virtual machine since Ruby 1.9.
Environment pointer: A pointer held by a frame that points to the bottom of the stack region used by that frame. The EP enables the VM to access local variables by offset and maintain stack discipline.
Frame: A data structure that holds the execution state of the virtual machine at a given point in time. Each frame contains the program counter, stack pointer, environment pointer, and local variables. See frame.
Instruction: A single operation that the virtual machine can perform. Often abbreviated as insn
. Instructions may have operands that specify what they operate on. See YARV stack instructions.
Instruction sequence: A list of instructions that the virtual machine executes. Often abbreviated as iseq
. Each method, block, and class body compiles to its own instruction sequence. Contains call counter and JIT entry point. See instruction sequence.
Instruction set: The complete set of instructions that the virtual machine can perform. YARV’s instruction set includes stack operations, arithmetic, control flow, and method calls.
Operand: A value that is used by an instruction. Operands are known at compile-time and are built into the instruction sequences, unlike stack values which are computed at runtime. See operand.
Program counter: A pointer to the current instruction in an instruction sequence. Also called an instruction pointer. Each frame has its own program counter that advances as instructions execute. See program counter.
Receiver: The object that a method is being called on. The receiver becomes self
inside the method and is stored in the callee’s frame. The receiver’s class determines method dispatch. See receiver.
Stack: A stack data structure that holds values being used by the virtual machine. Also called the value stack or operand stack. YARV is a stack-based virtual machine where operations push and pop values from this stack.
Stack pointer: A pointer held by a frame that points to the next available slot in the stack. Often abbreviated as sp
. The stack pointer moves up on push operations and down on pop operations, maintaining stack discipline.
Tracepoint: A publication/subscription system for virtual machine events. Tracepoints use the program counter to trigger callbacks at specific execution points, enabling debuggers and profilers.
Virtual machine: A piece of software that emulates a computer, executing bytecode instead of machine code. See virtual machine architecture.
YARV: The virtual machine used by CRuby. Stands for “Yet Another Ruby Virtual Machine”. YARV is a stack-based virtual machine that executes instruction sequences compiled from Ruby source.