Interpreters explained

Running code line by line

2,185 words11 min read

When you run Python code or execute JavaScript in your browser, there's no separate compilation step. You just write code and run it. But how does the computer execute your instructions if there's no machine code? The answer lies in interpreters - programs that read and execute other programs.

Interpreters represent a fundamentally different philosophy from compilers. Instead of transforming your entire program into machine code ahead of time, an interpreter reads your code and performs actions immediately. This trade-off - giving up some runtime speed for immediate execution - has profound implications for how languages are designed and used.

What is an interpreter?

An [[interpreter]] is like a translator who reads a book aloud, translating each sentence as they go. Rather than translating the entire book first (compilation), they translate and speak simultaneously (interpretation). The computer running the interpreter is doing two things at once: figuring out what your code means, and executing it.

This immediate execution is why interpreted languages feel interactive. You can type a line of Python into a REPL (Read-Eval-Print Loop) and see the result instantly. There's no compile step, no build process, no waiting. The interpreter reads your input, evaluates it, prints the result, and loops back for more.

The simplest interpreters work by walking the AST directly. They visit each node of the tree and perform the corresponding action:

def evaluate(node, environment):
    """Evaluate an AST node in the given environment (variable bindings)."""
    
    if isinstance(node, NumberLiteral):
        return node.value
        
    elif isinstance(node, StringLiteral):
        return node.value
        
    elif isinstance(node, Identifier):
        # Look up variable in environment
        if node.name not in environment:
            raise NameError(f"Undefined variable: {node.name}")
        return environment[node.name]
        
    elif isinstance(node, BinaryExpression):
        left = evaluate(node.left, environment)
        right = evaluate(node.right, environment)
        
        if node.operator == '+':
            return left + right
        elif node.operator == '-':
            return left - right
        elif node.operator == '*':
            return left * right
        elif node.operator == '/':
            if right == 0:
                raise ZeroDivisionError("Division by zero")
            return left / right
            
    elif isinstance(node, Assignment):
        value = evaluate(node.value, environment)
        environment[node.name] = value
        return value
        
    elif isinstance(node, FunctionCall):
        func = evaluate(node.function, environment)
        args = [evaluate(arg, environment) for arg in node.arguments]
        return func(*args)

# Usage:
# For "2 + 3 * 4":
result = evaluate(ast, {})  # Returns 14

This approach is called a tree-walking interpreter. It's simple to implement but relatively slow because it repeatedly traverses tree structures and makes many function calls for even simple operations. Every time you add two numbers, the interpreter must check what type of node it's looking at, dispatch to the right case, recursively evaluate children, and so on.

Bytecode: a faster approach

Most production interpreters don't walk the AST directly. Instead, they compile source code to [[bytecode]] - a compact, efficient representation designed for fast interpretation. Bytecode is lower-level than source code but higher-level than machine code. Think of it as machine code for an imaginary computer - one that's specifically designed to be easy to simulate in software.

The key insight is that we can pay the cost of parsing and analysis once, then execute the resulting bytecode many times. For a loop that runs a million iterations, this is a huge win - we parse it once but execute the bytecode a million times.

# Python source code:
def calculate(x):
    y = x * 2
    z = y + 10
    return z

# You can see the bytecode Python generates:
import dis
dis.dis(calculate)

# Output (Python 3.11):
#  2           0 RESUME                   0
#
#  3           2 LOAD_FAST                0 (x)
#              4 LOAD_CONST               1 (2)
#              6 BINARY_OP                5 (*)
#             10 STORE_FAST               1 (y)
#
#  4          12 LOAD_FAST                1 (y)
#             14 LOAD_CONST               2 (10)
#             16 BINARY_OP                0 (+)
#             20 STORE_FAST               2 (z)
#
#  5          22 LOAD_FAST                2 (z)
#             24 RETURN_VALUE

Stack-Based Interpreter

Python Source
x = 5
y = 3
z = x + y
Bytecode Instructions
0LOAD_CONST5
1STORE_NAMEx
2LOAD_CONST3
3STORE_NAMEy
4LOAD_NAMEx
5LOAD_NAMEy
6BINARY_ADD
7STORE_NAMEz
8LOAD_NAMEz
9RETURN_VALUE
Evaluation Stack
Stack is empty
Variables (Memory)
No variables defined
Step through bytecode execution instruction by instruction

A [[virtual machine]] executes this bytecode. The VM is a program that simulates a simple computer architecture. Most bytecode VMs are [[stack machine]]s - they use a stack for all operations. This design is elegant because it eliminates the need to specify operand locations; everything operates on the top of the stack.

Instruction       Stack                 Locals
─────────────────────────────────────────────────────
LOAD_FAST x       [5]                   {x: 5}
LOAD_CONST 2      [5, 2]                {x: 5}
BINARY_OP *       [10]                  {x: 5}
STORE_FAST y      []                    {x: 5, y: 10}
LOAD_FAST y       [10]                  {x: 5, y: 10}
LOAD_CONST 10     [10, 10]              {x: 5, y: 10}
BINARY_OP +       [20]                  {x: 5, y: 10}
STORE_FAST z      []                    {x: 5, y: 10, z: 20}
LOAD_FAST z       [20]                  {x: 5, y: 10, z: 20}
RETURN_VALUE      → returns 20

Why bytecode is faster

Bytecode interpretation beats tree-walking for several reasons:

FactorTree-WalkingBytecode
Memory layoutScattered tree nodes with pointersContiguous byte array
Dispatch costType checks, virtual calls, recursionSimple switch or computed goto
Overhead per opHigh (many function calls)Low (tight inner loop)
Cache efficiencyPoor (pointer chasing)Excellent (sequential access)
Code densityAST nodes are largeBytecode is compact

The bytecode interpreter's main loop is remarkably simple:

// Simplified bytecode interpreter loop
Value interpret(uint8_t* bytecode, Value* constants, Value* locals) {
    Value stack[256];
    Value* sp = stack;  // Stack pointer
    uint8_t* ip = bytecode;  // Instruction pointer
    
    while (true) {
        uint8_t instruction = *ip++;
        
        switch (instruction) {
            case OP_LOAD_CONST: {
                uint8_t index = *ip++;
                *sp++ = constants[index];
                break;
            }
            case OP_LOAD_FAST: {
                uint8_t index = *ip++;
                *sp++ = locals[index];
                break;
            }
            case OP_STORE_FAST: {
                uint8_t index = *ip++;
                locals[index] = *--sp;
                break;
            }
            case OP_BINARY_ADD: {
                Value b = *--sp;
                Value a = *--sp;
                *sp++ = value_add(a, b);  // Type checking inside
                break;
            }
            case OP_RETURN: {
                return *--sp;
            }
            // ... more cases
        }
    }
}

This loop is cache-friendly because bytecode is stored contiguously and accessed sequentially. The switch dispatch compiles to a jump table - a single indirect jump rather than a chain of comparisons. Modern CPUs can predict these jumps reasonably well after seeing a few iterations.

Just-In-Time (JIT) compilation

Even bytecode interpretation has overhead. Every instruction requires a switch dispatch, and the VM can't use CPU registers as effectively as native code. Enter [[JIT]] compilation - the technique that made JavaScript fast enough to build entire applications in the browser.

JIT compilers monitor running code and identify [[hot path]]s - code that executes frequently. When code gets hot enough, the JIT compiles it to native machine code. The next time that code runs, it executes at full speed, with no interpretation overhead.

function sumArray(arr) {
    let total = 0;
    for (let i = 0; i < arr.length; i++) {
        total += arr[i];  // This line executes millions of times
    }
    return total;
}

// First few calls: interpreted (slow, but fast to start)
sumArray([1, 2, 3]);  

// V8 notices this function is "hot" (called often)
// Ignition (baseline interpreter) collects type feedback
// TurboFan (optimizing JIT) compiles to machine code

// Later calls: native speed
sumArray(millionElementArray);  // Runs compiled machine code

// Potential issue: if types change, code must be deoptimized
sumArray(['a', 'b', 'c']);  // Strings! Compiled code assumed numbers
// V8 "deoptimizes" - falls back to interpreter, recompiles later

Inline caching: speeding up dynamic dispatch

In dynamic languages like JavaScript or Python, you often don't know the type of a variable until runtime. When you write `obj.foo()`, the runtime must look up `foo` in `obj`'s prototype chain - potentially checking multiple objects. This lookup is expensive.

[[inline cache]]s solve this by remembering what happened last time. The first time you call `obj.foo()`, the runtime does the full lookup. But it also patches the call site to remember: 'Last time, `obj` had shape X, and `foo` was at offset Y.' Next time, if `obj` still has shape X, skip the lookup and go directly to offset Y.

// Conceptually, how inline caching works:

// First call to point.getX():
// 1. Look up 'getX' in point's prototype chain (expensive)
// 2. Find it's at offset 0 of Point.prototype
// 3. Cache: "Point shape → offset 0"
// 4. Call the function

// Second call to point.getX():
// 1. Check: is this a Point? Yes!
// 2. Use cached offset 0, skip lookup
// 3. Call the function (much faster)

// This is why consistent object shapes matter:
function Point(x, y) {
    this.x = x;  // Shape: {x}
    this.y = y;  // Shape: {x, y}
}

// All Points have the same shape - inline caches hit
let p1 = new Point(1, 2);
let p2 = new Point(3, 4);

// This breaks the pattern - different shape!
let p3 = new Point(5, 6);
p3.z = 7;  // Shape: {x, y, z} - cache miss

Polymorphic inline caches (PICs) handle multiple types at one call site by caching several type → offset mappings. But if too many types appear (megamorphic call sites), the cache gives up and falls back to slow lookups. This is why consistent types improve JavaScript performance.

Deoptimization: when assumptions fail

JIT compilers make assumptions to generate fast code. 'This variable is always a number.' 'This function never throws.' 'This array has no holes.' These assumptions enable aggressive optimizations. But what happens when they're wrong?

[[deoptimization]] is the escape hatch. When an assumption fails, the VM must throw away the optimized machine code and fall back to the interpreter. The current execution state is translated from machine registers back to the interpreter's format, and execution continues at reduced speed.

function processItem(item) {
    return item.value * 2;  // JIT assumes: item.value is always a number
}

// Warm up - all numbers
for (let i = 0; i < 10000; i++) {
    processItem({ value: i });
}
// Now processItem is JIT-compiled with "value is number" assumption

// Trigger deoptimization
processItem({ value: "oops" });  // String! Assumption violated
// VM says: "Whoa, that's not a number!"
// 1. Stop executing machine code
// 2. Reconstruct interpreter state
// 3. Continue in interpreter
// 4. Recompile later with more general code (slower)

How V8 executes JavaScript

Let's trace how V8 (Chrome's engine) handles JavaScript from start to finish:

Source Code ("function add(a, b) { return a + b; }")
    ↓ Parser
AST (Abstract Syntax Tree)
    ↓ Ignition (Bytecode Compiler)
Bytecode
    ↓ Ignition (Interpreter)
Execution (collects type feedback)
    │
    ├─── Function is "cold" (rarely called)
    │    └── Keep interpreting
    │
    └─── Function is "hot" (called often)
         ↓ Sparkplug (Baseline Compiler)
         Unoptimized Machine Code (fast startup)
         ↓ (if still hot)
         ↓ TurboFan (Optimizing Compiler)
         Optimized Machine Code
              │
              ├── Types stable → Fast execution
              │
              └── Types change → Deoptimize
                   └── Back to Ignition, try again

Each tier is a trade-off. Ignition compiles quickly but runs slowly. TurboFan compiles slowly but runs fast. By profiling execution, V8 knows where to invest compilation effort. Hot loops get optimized; cold startup code stays in bytecode.

Python's approach: CPython

Python takes a simpler approach than JavaScript. CPython (the standard Python interpreter) compiles to bytecode but doesn't JIT compile to machine code. This is a deliberate trade-off:

**Extreme dynamic typing**: Python's flexibility makes optimization hard. A variable can hold any type and change types at any time. Methods can be added or removed from classes at runtime. Even basic operations like `a + b` might call custom `__add__` methods.

**Global Interpreter Lock (GIL)**: CPython's GIL prevents true parallel execution of Python bytecode. This simplifies memory management but limits multi-core performance, reducing the payoff from aggressive optimization.

**Simplicity as a feature**: CPython is intentionally readable and hackable. The implementation serves as documentation. Complex JIT machinery would obscure this simplicity.

import dis
import sys

def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

# See the bytecode
dis.dis(fibonacci)

# Check bytecode size
code = fibonacci.__code__
print(f"Bytecode size: {len(code.co_code)} bytes")
print(f"Constants: {code.co_consts}")
print(f"Variable names: {code.co_varnames}")

For Python code that needs speed, the solution is typically calling into C extensions (NumPy, pandas, TensorFlow) or using alternative implementations like PyPy (which has a sophisticated JIT). Python 3.11 added a 'specializing adaptive interpreter' that's a step toward JIT, and Python 3.12 continued these optimizations.

The speed hierarchy

Different execution strategies offer different performance characteristics:

StrategyRelative SpeedStartup TimeMemory UseExamples
Native (AOT)100% (baseline)InstantLowC, Rust, Go
JIT (optimized)70-100%Slow (warm-up)HighJava HotSpot, V8 TurboFan
JIT (baseline)30-50%MediumMediumV8 Sparkplug, JVM C1
Bytecode VM5-20%FastLowCPython, Ruby MRI
Tree-walking1-5%InstantMediumEarly PHP, simple shells

These numbers are rough approximations. Modern JIT compilers can sometimes beat ahead-of-time compiled code because they can optimize for the specific data and usage patterns observed at runtime. A JIT can specialize a function for the types it actually receives, while an AOT compiler must handle all possible types.

Why use interpreters at all?

Given the performance gap, why not compile everything ahead of time? Interpreters offer real advantages:

**Faster development cycle**: No compile step means instant feedback. Change code, run immediately. For prototyping and scripting, this is invaluable.

**Platform independence**: Bytecode runs anywhere the VM runs. Java's 'write once, run anywhere' promise, despite its complexities, is enabled by bytecode interpretation.

**Dynamic features**: eval(), runtime code generation, monkey patching, and reflection are natural in interpreted languages. Compilers struggle with code that might not exist at compile time.

**Simpler deployment**: No need to manage compiled binaries for each platform, link libraries, or deal with ABI compatibility. Ship source or bytecode; the VM handles the rest.

**Runtime safety**: The VM can catch errors like buffer overflows, null pointer dereferences, and array bounds violations that would crash or exploit compiled code.

**Interactive development**: REPLs, hot reloading, and live coding are natural for interpreted languages. You can modify a running program without restarting it.

Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

Greenspun's Tenth Rule
How Things Work - A Visual Guide to Technology