Stack-Based Virtual Machine

A stack-based virtual machine is a type of virtual machine architecture that uses a stack data structure as its primary mechanism for storing and manipulating values during program execution. This contrasts with register-based VMs that use named registers (like physical CPUs).

Core Concept

In a stack-based VM, all operations work with values on a stack:

Operands are pushed onto the stack
Operations pop operands from the stack
Results are pushed back onto the stack

graph LR
    A[Push 5] --> B[Push 3]
    B --> C[Add]
    C --> D[Result: 8]

    style A fill:#e1f5ff
    style B fill:#e1f5ff
    style C fill:#fff4e1
    style D fill:#e8f5e9

Example: Adding Two Numbers

Stack-based approach:

Instructions:    Stack state:
-------------    ------------
push 5          [5]
push 3          [5, 3]
add             [8]        # pop 5 and 3, push result

Register-based approach:

Instructions:
-------------
load r1, 5      # r1 = 5
load r2, 3      # r2 = 3
add r3, r1, r2  # r3 = r1 + r2

Visual Representation

Stack operations for: (2 + 3) * 4

Step 1: push 2
┌─────┐
│  2  │ ← top
└─────┘

Step 2: push 3
┌─────┐
│  3  │ ← top
├─────┤
│  2  │
└─────┘

Step 3: add (pops 3 and 2, pushes 5)
┌─────┐
│  5  │ ← top
└─────┘

Step 4: push 4
┌─────┐
│  4  │ ← top
├─────┤
│  5  │
└─────┘

Step 5: multiply (pops 4 and 5, pushes 20)
┌─────┐
│ 20  │ ← top (final result)
└─────┘

Advantages

1. Simplicity

Fewer concepts to implement (just a stack)
No register allocation needed
Straightforward instruction encoding

2. Compact Bytecode

Instructions don’t need to specify register names
add instead of add r3, r1, r2
Smaller bytecode files

3. Easy to Generate

Compilers can generate code without register allocation
Direct translation from abstract syntax tree expressions
Natural fit for recursive descent compilation

4. JIT-Friendly

Just-in-time compilation can optimize stack operations
Easy to convert to register-based code during JIT
Simple intermediate representation

5. Platform Independence

No dependency on physical register count
Same bytecode works on different architectures
True virtualization

Disadvantages

1. More Instructions

Separate push/pop for each value
Register-based VMs can operate on values in place
Higher instruction count for equivalent operations

2. Stack Manipulation Overhead

Constant pushing and popping
Memory traffic for temporary values
Cache pressure from stack accesses

3. Less Direct Hardware Mapping

Physical CPUs use registers
Translation layer needed for execution
Potentially slower without JIT

4. Limited Optimization

Hard to optimize stack accesses
Register-based code can reuse values in registers
Value lifetime tracking is implicit

Examples in Practice

Stack-based VMs:

YARV (Ruby)
Java Virtual Machine (JVM)
Python’s CPython VM
WebAssembly (WASM)
PostScript

Register-based VMs:

Lua VM
Dalvik (Android’s original VM)
.NET CLR

Implementation Example

A simple stack-based VM in pseudocode:

class StackVM
  def initialize
    @stack = []
    @program = []
    @pc = 0  # Program counter
  end
 
  def push(value)
    @stack.push(value)
  end
 
  def pop
    @stack.pop
  end
 
  def add
    b = pop
    a = pop
    push(a + b)
  end
 
  def multiply
    b = pop
    a = pop
    push(a * b)
  end
 
  def run(program)
    @program = program
    @pc = 0
 
    while @pc < @program.length
      instruction = @program[@pc]
      execute(instruction)
      @pc += 1
    end
 
    @stack.last  # Return top of stack
  end
 
  def execute(instruction)
    case instruction.type
    when :push
      push(instruction.value)
    when :add
      add
    when :multiply
      multiply
    end
  end
end
 
# Calculate (2 + 3) * 4
vm = StackVM.new
result = vm.run([
  {type: :push, value: 2},
  {type: :push, value: 3},
  {type: :add},
  {type: :push, value: 4},
  {type: :multiply}
])
# => 20

The Stack Contract

Stack-based VMs typically enforce a stack discipline:

Functions must clean up their stack usage
Stack height should be predictable
Unbalanced stack operations cause errors

Example of stack contract:

def add_and_square(a, b)
  # Stack on entry: [a, b]
 
  add           # Stack: [a+b]
  dup           # Stack: [a+b, a+b]
  multiply      # Stack: [(a+b)²]
 
  # Stack on exit: [(a+b)²]
  # Net effect: consumed 2 values, produced 1
end

See YARV for how Ruby’s VM implements this contract.

Stack vs Expression Evaluation

Stack-based VMs naturally mirror how mathematical expressions are evaluated:

Infix notation: 2 + 3 * 4 Postfix notation (RPN): 2 3 4 * + Stack operations: Same as postfix!

push 2
push 3
push 4
multiply  # 3 * 4 = 12
add       # 2 + 12 = 14

This connection to reverse Polish notation (RPN) explains why stack-based VMs are intuitive for expression evaluation.

Memory Hierarchy

Stack-based VMs still use other memory structures:

┌──────────────────────┐
│    Stack (values)    │ ← Primary operand storage
├──────────────────────┤
│   Call Stack         │ ← Function calls, return addresses
├──────────────────────┤
│   Heap               │ ← Object allocation
├──────────────────────┤
│   Constant Pool      │ ← Compile-time constants
└──────────────────────┘

The “stack” in “stack-based VM” refers specifically to the operand stack, not the entire memory model.

Modern Hybrid Approaches

Many modern VMs combine approaches:

JVM: Stack-based bytecode, register-based JIT
YARV: Stack-based with JIT optimization
WebAssembly: Stack-based with register hints

This gives the benefits of simple bytecode with the performance of register-based execution.

Gradual Notes

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Reentrant Code

Zeigarnik Effect

Migrating from Resque to Sidekiq