Building an interpreter: Conditional execution
Until now, all statements and expressions we’ve added to our bytecode virtual machine have been compiled into unconditional instructions, executed one after the other. This has prevented us from controlling the flow of our program. In this post, we will introduce control flow statements, if-else, logical and/or, while and for statements, that will enable conditional execution of instructions.
These statements are relatively easy to implement; they require just the following new opcodes:
A conditional forward jump -
JUMP_IF_FALSE
An unconditional forward jump -
JUMP
An unconditional backward jump -
LOOP
These opcodes take an operand, the jump target, which is an offset from the jump instruction to where execution should continue. At the heart of each of the control flow statements is a combination of these jump instructions to control the point at which our bytecode interpreter picks up executing instructions. We'll start with the if-else statement.
if-else statements
If
statements conditionally execute different branches of code based on the value of expressions within them. Our interpreter will achieve this using the JUMP_IF_FALSE
and JUMP
instructions. Let’s see how the if
statement is compiled.
As usual, we define the if_statement
method below that will handle compiling the if
statement.
pub fn if_statement(&mut self, i: &Interpreter){
let loop_start = self.chunk.as_ref().unwrap().code.len();
self.consume(LeftParen, "Expect '(' after 'if'.");
self.expression(i);
self.consume(RightParen, "Expect '(' after 'if'.");
let then_jump = self.emit_jump(OpJumpIfFalse);
self.statement(i);
let jump = self.emit_jump(OpJump);
self.patch_jump(then_jump);
if self.match_token(vec![TokenType::Else]){
self.statement(i);
}
self.patch_jump(jump);
}
The if statement is of the form if expression then block else block.
A compiled if statement is of the general form shown in figure 1.0.
The process for compiling the if statement is as follows.
First, compile the
if
statement's condition expression.Emit a
JUMP_IF_FALSE
instruction with a placeholder as its jump target. This instruction will skip theif
block if the condition from above is false. We have to use a placeholder as the target because we do not yet know where the code should jump to.Compile the instructions within the
if
’s then block.Emit a
JUMP
instruction with a placeholder for it target. If an else block exists, this instruction will skip over it. A placeholder is used for the jump target.Compile statements in the
else
block if any.Patch the jump instructions. We use the
patch_jump
method to replace the placeholders with the actual jump locations. Patching in this case means computing the required offset from the jump instruction to where the interpreter should continue execution and then updating the jump operands. Patching happens at different points within the code - once we know where a jump must end up we, patch that jump statement. The implementation of thepatch_jump
method is shown below.fn patch_jump(&mut self, offset: usize){ let jump = self.chunk.as_ref().unwrap().code.len() - offset - 2; if jump as i16 > i16::MAX { panic!("Too much code to jump over!") } self.chunk.as_mut().unwrap().code[offset] = ((jump >> 8) & 0xff) as u8; self.chunk.as_mut().unwrap().code[offset + 1] = (jump & 0xff) as u8; }
The jump target is two bytes wide, large enough for Sox, so what the
patch_jump
method shows here is how we go about extracting the first and second bytes of our operand using bit manipulation in Rust.
To finish up, we add the JUMP
and JUMP_IF_FALSE
statements to our eval loop, as shown below, and implement helpers to disassemble them.
OpCode::OpJumpIfFalse => {
let offset = read_short!(self);
let val = peek_stack!(self);
if !val.try_into_rust_bool(i) {
self.ip += offset as usize;
}
},
OpCode::OpJump => {
let offset = read_short!(self);
self.ip += offset as usize;
},
Logical Operators
Logical operators like and
and or
use a technique called "short-circuiting" to optimize execution. This means they can conditionally skip parts of the expression, directly affecting the program's flow. This is also implemented using JUMPS.
and Operator
Sox and
expressions are of the form left expression and right expression.
If the left side of this expression is false, the value of the entire expression is false, so the right expression is not evaluated. We use a JUMP_IF_FALSE
instruction to skip the right expression and go directly to the next instruction after the and
expression. This is shown in figure 2.0.
To compile an and
statement, we add a parse rule for the and
expressions to our parser table. The and
operator is an infix operator so we define a parse rule with an infix and_
method for our expression parser table. This compiles the and
expression and is shown below.
pub fn and_(&mut self, i: &Interpreter) {
let end_jump = self.emit_jump(OpJumpIfFalse);
self.emit_byte((OpPop, None));
self.parse_with_precedence(Precedence::And, i).expect("TODO: panic message");
self.patch_jump(end_jump);
}
Or Operator
If the left expression of an or
expression is true, the entire expression is true, and the right expression is skipped. Conversely, if the left expresson is false, the right expression must be evaluated to determine the value of the whole expression. So when the left expression is false, we jump to instructions for the right expression, taking care to pop the value of the left expression that is still on the stack first. Figure 3.0 illustrates the jump flow for an or
expression.
The or_ method for compiling the or statement is implemented in the same fashion as the and_
method and is shown below.
pub fn or_(&mut self, i: &Interpreter) {
let else_jump = self.emit_jump(OpJumpIfFalse);
let end_jump = self.emit_jump(OpJump);
self.patch_jump(else_jump);
self.emit_byte((OpPop, None));
self.parse_with_precedence(Precedence::Or, i).expect("TODO: panic message");
self.patch_jump(end_jump);
}
In both logical operators, there is no need to make any change to the eval loop as we already have implemenations for our jump instructions.
while statement
The while_statement
method below compiles the while
statement. This method goes into the statement
method like handlers for other statements.
pub fn while_statement(&mut self, i: &Interpreter){
let loop_start = self.chunk.as_ref().unwrap().code.len();
self.consume(LeftParen, "Expect '(' after 'while'.");
self.expression(i);
self.consume(RightParen, "Expect ')' after 'while'.");
let exit_jump = self.emit_jump(OpJumpIfFalse);
self.emit_byte((OpPop, None));
self.statement(i);
self.emit_loop(loop_start);
self.patch_jump(exit_jump);
}
Compiling a while
statement combines the JUMP_IF_FALSE
and LOOP
instructions. The loop instruction is analogous to the JUMP
instruction but takes a negative value for the jump target, meaning that a loop instruction takes the instruction pointer back to execute code that may have been previously executed. Figure 4.0 is a graphical illustration of compiled instructions for the while statement.
Before compiling the while condition expression that determines if the body of the while loop should execute, we mark the instruction pointer as the loop start; we will loop back to this point to continue execution when applicable.
Next, compile the while expression condition, which will leave a value on the stack after execution.
Next, we emit a
JUMP_IF_FALSE
statement. This is the exit jump which should jump out of the while statement loop when the condition expression isFalse
.We emit a
POP
instruction that should clear the condition expression from the stack when executed.We then go ahead and compile the body of the while expression.
Next we emit a
LOOP
statement with the jump target set to the value from #1.Finally, we patch the
JUMP_IF_FALSE
statement since we now know where execution should continue.
The next step is to add the loop to our eval loop below and implement helpers for disassembling this instruction.
OpCode::OpLoop => {
let offset = read_short!(self);
self.ip -= offset as usize;
}
for Statement
The for statement is the trickiest control flow statement due to the number of components. We must support the initializer
, condition
, and increment
clauses for the for
statement. As usual, we define a for_statement
method and add it to the statement
method. The for_statement
method is shown below.
pub fn for_statement(&mut self, i: &Interpreter){
self.begin_scope(i);
self.consume(LeftParen, "Expect '(' after 'for'").expect("");
if self.match_token(vec![TokenType::Semi]){
} else if self.match_token(vec![TokenType::Let]){
self.let_declaration(i);
} else{
self.expression_statement(i);
}
self.consume(TokenType::Semi, "Expect ';'").expect("");
let mut loop_start = self.chunk.iter().len();
let exit_jump = None;
if (!self.match_token(vec![TokenType::Semi])){
self.expression(i);
self.consume(TokenType::Semi, "Expect ';' after variable declaration.").expect("");
let exit_jump = Some(self.emit_jump(OpJumpIfFalse));
self.emit_byte((OpPop, None));
}
if (!self.match_token(vec![TokenType::RightParen])){
let body_jump = self.emit_jump(OpJump);
let increment_start = self.chunk.iter().len();
self.expression(i);
self.emit_byte((OpPop, None));
self.consume(TokenType::RightParen, "Expect ')' after for clauses").expect("");
self.emit_loop(loop_start);
loop_start = increment_start;
self.patch_jump(body_jump);
}
self.consume(TokenType::Semi, "Expect ';'").expect("");
self.consume(RightParen, "Expect ')' after for clauses.").expect("");
self.statement(i);
self.emit_loop(loop_start);
if exit_jump.is_some() {
self.patch_jump(exit_jump.unwrap());
self.emit_byte((OpPop, None));
}
self.end_scope(i)
}
A for loop can declare a variable that should be scoped to the body of the loop, so the implementation of
for_method
goes in abegin_scope
...end_scope
.For the initializer clause, we can use a variable declaration, an expression or nothing, so we search accordingly, taking care to avoid leaving any value on our stack.
We mark the instruction pointer at this point as the
loop_start
. When the for statement loops back, it should loop back to here and continue execution.The condition clause, where present, determines if the
for
loop should continue executing. We search for a condition clause expression and, where present, compile it with anexpression
call that leaves the value on the stack after execution.We then emit an
OP_JUMP_IF_FALSE
— if the value of the condition expression is false, then this is our exit jump from the loop.This is followed by a
POP
instruction to clear the stack.Next, we have to check for our increment clause. We must perform an intricate set of jumps to compile the increment statement within the for loop. This is because even though the increment precedes the body of the
for
statement, the increment instruction is executed after thefor
loop's body is executed. To compile this, we emit aJUMP
instruction as the first instruction; this will cause our interpreter to jump to the body of the for loop and execute. Next, we emit instructions for the increment statement. After the increment code section executes, we must execute the conditional clause of the for loop. Hence, we emit aLOOP
expression that jumps to just before the conditional expression. Finally, we update theloop_start
to point to the start of the increment code section so that theLOOP
after thefor
loop’s body takes execution to the start of the increment code section. Figure 5.0 captures theLOOP
s andJUMP
s used in thefor
statement.The interpreter executes a set of instructions till it get to the increment instruction. It then jump over the increment instructions to the loop's body and executes the body. At the end of the body, it jumps to the start of the increment instruction, which, after executing, jumps to the condition, and the whole cycle continues until we can exit the loop.
With jumps implemented, we now have almost all we need to introduce functions and classes, the most important Sox abstractions. In the next set of posts we will begin to look at these. As usual, all code is in github.