Building an interpreter: The print statement and variable declarations.
A Sox program is a collection of declarations and a declaration is either a class declaration, function declaration, variable declaration or a statement. This post will look at how to compile a subset of declarations and statements. At the end of this post, our compiler will support variable declarations, expression statements (already covered), blocks and print statements. We adopt a pattern where each type is mapped to a function that handles compiling that type. For instance, a declaration is either a variable declaration or a statement so it maps to a method that checks for a variable declaration or statement and calls the appropriate method for the type discovered.
Compiling the print statement
A print
statement is one of the statement types. We began with statements in our previous post when we introduced expression statements. Now, we expand on that concept to support the print statement.
We make the following compiler changes to support the print
statement in our compiler.
A Sox program is a collection of declarations, so we introduce a loop in the compile method to handle these declarations. Each declaration is parsed using the
declaration
method.pub fn compile( &mut self, source: &'static str, chunk: Chunk, i: &Interpreter, ) -> Result<Chunk, ()> { let lexer = Lexer::new(source); self.chunk = Some(chunk); self.tokens = Some(lexer.peekable()); self.advance(); while !self.match_token(vec![TokenType::EOF]) && self.current.is_some() { self.declaration(i); } Ok(self.chunk.take().unwrap()) }
The
declaration
method follows the Sox declaration definition with oneif
arm for each supported declaration type as below.pub fn declaration(&mut self, i: &Interpreter) { if self.match_token(vec![TokenType::Let]) { self.let_declaration(i); } else { self.statement(i); } }
The
statement
method follows the same pattern with oneif
arm for each statement type, one of which is theprint
statement.pub fn statement(&mut self, i: &Interpreter) { if let Some(token) = self.current.as_ref() { if self.match_token(vec![TokenType::Print]) { self.print_statement(i); } else if self.match_token(vec![TokenType::LeftBrace]) { self.begin_scope(i); self.block(i); self.end_scope(i); } else { self.expression(i); } } }
Finally, we define the
print_statement
method, which compiles aprint
statement. First, it calls theexpression
method to compile theexpression
to which theprint
statement will apply, and then it emits theOpPrint
opcode.pub fn print_statement(&mut self, i: &Interpreter) { self.expression(i); self.consume(Semi, "Expect ';' after value."); self.emit_byte((OpCode::OpPrint, None)); }
We add the
OpPrint
opcode to the execution loop as below. This pops the value at the top of the stack and prints its value.OpCode::OpPrint => { let val = pop_stack!(self); let repr_str = val.repr(i); println!("{}", repr_str.unwrap().as_str()); },
Finally, we have to define helper methods for disassembling the print opcode.
The print
statement is now ready for use, so we can move on to compiling variables.
Variable Declarations
Sox variables are either global or local variables. At this point, local variables refer to variables in blocks since we do not yet have functions and classes. We begine with compiling global variables.
Compiling global variables
Our simple implementation of global variables uses an hashmap interpreter attribute to store our global variables. We will modify our compiler to support generating instructions for the following functionality for global variables -
Declaring a new variable with
let
.Accessing new variables using an identifier.
Storing new values in existing variables using assignment expression.
Declaring a new variable
A variable declaration is a type of declaration, so all we need to do here is implement the let_declaration
method from the declaration method above, which handles compiling declarations. This method is the heart of the global variable compilation.
pub fn let_declaration(&mut self, i: &Interpreter) {
let global = self.parse_variable(i, "Expect variable name.");
if self.match_token(vec![TokenType::Equal]) {
self.expression(i);
} else {
self.emit_byte((OpNone, None))
}
self.consume(TokenType::Semi, "Expect ';' after variable declaration.").expect("");
self.define_variable(global.unwrap(), i);
}
The let_declaration
does the following.
Parses the variable identifier and inserts it into the bytecode chunk's constant array via the call to the
parse_variable
method below.pub fn parse_variable(&mut self, i: &Interpreter, message: &str) -> Option<usize> { self.consume(TokenType::Identifier, message).expect(""); let global = self.identifier_constant(self.previous.as_ref().unwrap().lexeme.to_string(), i); return Some(global); }
Compiles the expression a variable refers to if one exists - this expression will come after the
=
sign. Compiling the expression is handled with the call toself.expression
. When there is no expression, then the compiler outputsOpNone
opcode. This supports both ways of declaring a variable - either aslet x;
, wherex
is a variable that has been declared but not initialised or aslet x = 1;
where the variable is declared and initialised with an expression.Outputs bytecode instructions that insert the global variable into our global hashmap keyed by the variable identifier via the call to
define_variable
method below.pub fn define_variable(&mut self, global: usize, i: &Interpreter) { self.emit_byte((OpCode::OpDefineGlobal, Some(global as u8))); }
We then add the OpDefineGlobal
to our execution loop, as shown below. This reads the variable name, a constant, and inserts whatever value is at the top of the stack into the global attribute with the variable name as key. Recall that whatever value is at the top of the stack is the result of the expression that the variable should be set to.
OpCode::OpDefineGlobal => {
let v = read_constant!(self);
let name = v.payload::<SoxString>().unwrap();
let nv = name.value.to_string();
self.globals.insert(nv, peek_stack!(self));
},
Reading a variable
We access the variable via the variable name, an expression, so we must hook up identifier tokens to the expression parser table in order to be able to parse it. We update the entry for the TokenType::Identifier
to use the variable
method below as a prefix rule.
pub fn variable(&mut self, i: &Interpreter) { self.named_variable(self.previous.as_ref().unwrap().lexeme.to_string(), i);
}
The named_variable
compiler method retrieves the index of the variable name in the constants vector and simply emits the opcode for reading the variable, as shown below.
pub fn named_variable(&mut self, name: String, i: &Interpreter) {
arg = Some(self.identifier_constant(name, i) as u8);
self.emit_byte((OpGetGlobal, arg))
}
We then add an entry, shown below, for the OpGetGlobal
opcode to the evaluation loop. At runtime, this instruction obtains the constant value that represents the variable name from the chunk’s constants vector. It uses the instruction operand as the index into the constants vector to fetch the variable name and then looks up the value in the global hash map, panicking if the key does not exist. If the key exists, we push the value onto the stack.
OpCode::OpGetGlobal => {
let v = read_constant!(self);
let name = v.payload::<SoxString>().unwrap();
if let Some(val) = self.globals.get(&name.value) {
push_stack!(self, val.clone());
} else{
panic!("Global variable {} not found", name.value);
}
},
Finally, we implement helpers to disassemble the OpGetGlobal
opcode, and we are done with reading global variables.
Storing a new value in a variable
Variables are updated using set expressions. A set expression is a variable identifier followed by the "=
" sign and an expression. To handle variable assignments right now, all we have to do is check for a =
sign after a variable identifier; if present, we compile as a setter; otherwise, we just compile as a getter.
To compile this, we update the named_variable
compiler method, as shown below, introducing an if
arm that checks for a =
sign and handles it accordingly, compiling the expression and emitting instructions for setting the global variable.
pub fn named_variable(&mut self, name: String, i: &Interpreter) {
if self.match_token(vec![TokenType::Equal]) && self.can_assign {
self.expression(i);
self.emit_byte((set_op, arg))
} else {
self.emit_byte((get_op, arg))
}
}
We have introduced the can_assign
attribute to the compiler to handle precedence when executing assignments. This attribute is set in the parse_precedence
method to the value precedence <= Precedence::Assignment
. The Precedence::Assignment
precedence here restricts compiling an assignment to only parsing expression statements.
pub fn parse_with_precedence(
&mut self,
precedence: Precedence,
i: &Interpreter,
) -> Result<(), ()> {
self.advance();
let previous_token_type = self.previous.as_ref().map(|token| token.token_type);
if let Some(previous_token_type) = previous_token_type {
let prefix_rule = self.get_rule(previous_token_type).prefix_fn;
if let Some(prefix_rule_fn) = prefix_rule {
self.can_assign = precedence <= Precedence::Assignment;
prefix_rule_fn(self, i);
} else {
return Err(());
}
} else {
return Err(());
}
while self.current.is_some()
&& precedence
<= self
.get_rule(self.current.as_ref().unwrap().token_type)
.precedence
{
self.advance();
let infix_rule = self
.get_rule(self.previous.as_ref().unwrap().token_type)
.infix_fn;
if let Some(infix_rule_fn) = infix_rule {
infix_rule_fn(self, i)
}
}
if self.can_assign && self.match_token(vec![TokenType::Equal]) {
panic!("Invalid assignment target.");
}
Ok(())
}
We then add the OpSetGlobal
opcode to our evaluation loop as below.
OpCode::OpSetGlobal => {
let v = read_constant!(self);
let name = v.payload::<SoxString>().unwrap();
if let Some(val) = self.globals.get_mut(&name.value) {
*val = peek_stack!(self);
} else {
panic!("Attempted to set undefined global variable {}", name.value);
}
},
Like the OpGetGlobal
opcode from the previous section, this retrieves the variable name, a constant, and then attempts to set the entry with that name in the global hashmap to the value on the top of the stack. It panics if no global variable with that name exists.
Compiling local variables.
Local variables are variables defined in blocks; recall a block is just a sequence of statements delimited by braces. Our compiler does not yet support blocks, so first, we need to add support for blocks before we look at local variables.
To compile blocks, we modify our compiler to include a field, scope_depth
, for keeping track of scopes that blocks create and implement the begin_scope, block and end_scope methods that are defined in the self.match_token(vec![TokenType::LeftBrace])
arm of the statement method. This arm maps to block definitions.
Before entering a block, we call begin_scope
, which increments the scope_depth
attribute; at the block's exit, we call the end_scope
, which decrements the scope_depth
value. The block
method shown below reads statements till it hits a closing brace and then returns.
pub fn block(&mut self, i: &Interpreter) {
while !self.check(TokenType::RightBrace) && !self.check(TokenType::EOF) {
self.declaration(i);
}
self.consume(TokenType::RightBrace, "Expect '}' after block.");
}
Representing local variables
Our tree walker interpreter implicitly used a chain of Environments to store local variables. However, we adopt a different approach and store local variables directly on our execution stack. This works because statements other than variable declarations don’t leave any value on the stack after execution so any value at the end of a statement any value remaining on the stack can be safely assumed to be a local variable. Consequently, we can store our variables on the stack without interfering with the proper execution of our bytecode. To support this in our compiler, we modify it to introduce the locals attribute, which stores information about locals encountered during compilation. Additionally, we define the local structure that encapsulates data about a local variable. Every time we encounter a local variable, we append a new local structure to our locals vector. At any given time, the values on the stack mirror those in the locals array. Therefore, resolving a variable involves finding the index of the local variable in the locals vector and then using that index to query the execution stack.
We will use the existing infrastructure for global variables to implement local variables with the following modifications.
We modify our compiler to hold a reference to a vector of Locals which represent local variables on the stack.
#[derive(Clone, Debug)] pub struct Local { name: String, depth: Option<usize>, } pub struct Compiler { chunk: Option<Chunk>, previous: Option<Token>, current: Option<Token>, tokens: Option<Peekable<Lexer>>, can_assign: bool, locals: Vec<Local>, scope_depth: usize, }
The
parse_variable
method parses and returns an index into the constants array of the chunk in the case of a global variable. For local variables, this is no longer required, when in a local scope, i.e.scope_depth > 0
, we declare this variable using thedeclare_method
and return aNone
value as below.pub fn parse_variable(&mut self, i: &Interpreter, message: &str) -> Option<usize> { self.consume(TokenType::Identifier, message).expect(""); if self.scope_depth > 0 { self.declare_variable(i); return None; } let global = self.identifier_constant(self.previous.as_ref().unwrap().lexeme.to_string(), i); return Some(global); }
The
declare_variable
method checks that no other local variable with the same name exists in the current scope and then adds the variable as a local by calling the add_local method. Notice how thescope_depth
on the local is set toNone
. This value is a proxy for a variable declared but not yet defined. This distinction helps us prevent an edge case where we try to assign a variable to itself within a scope.pub fn declare_variable(&mut self, i: &Interpreter) { if self.scope_depth == 0 { return; } let name = self.previous.as_ref().unwrap().lexeme.to_string(); for local in self.locals.iter().rev() { if local.depth.is_some() && (local.depth.unwrap() < self.scope_depth) { break; } if local.name == name { panic!("Variable with this name already declared in this scope."); } } self.add_local(name) }
The
add_local
method shown below creates a new Local structure and adds to the locals vector. That is it for declaring a local variable.pub fn add_local(&mut self, name: String) { let local = Local { name, depth: None }; self.locals.push(local); }
When we exit a scope, we have to discard all locals in that scope, so we modify
end_scope
so that when it is called, it walks backwards through thelocals
vector and discards all local variable in that scope.pub fn end_scope(&mut self, interpreter: &Interpreter) { self.scope_depth -= 1; while self.locals.len() > 0 && self.locals.last().unwrap().depth.is_some() && self.locals.last().unwrap().depth.unwrap() > self.scope_depth { self.emit_byte((OpCode::OpPop, None)); } }
The
named_variable
method that is called when we reference a global variable is modified so that it attempts to resolve a variable locally using a newly definedresolve_local
method before proceeding to a global resolution. This is the method that actually emits opcodes for getting or setting local variables.pub fn named_variable(&mut self, name: String, i: &Interpreter) { let get_op; let set_op; let mut arg = self.resolve_local(name.clone(), i); if let Some(arg) = arg { get_op = OpGetLocal; set_op = OpSetLocal; } else { arg = Some(self.identifier_constant(name, i) as u8); get_op = OpGetGlobal; set_op = OpSetGlobal; } if self.match_token(vec![TokenType::Equal]) && self.can_assign { self.expression(i); self.emit_byte((set_op, arg)) } else { self.emit_byte((get_op, arg)) } }
resolve_local
is a simple method that walks backwards through our locals vector, searching for a given local variable. If it finds the local entry, it returns its index, which will be used to index the execution stack for the variable; otherwise, it returnsNone
.pub fn resolve_local(&mut self, name: String, i: &Interpreter) -> Option<u8> { for (i, local) in self.locals.iter().enumerate().rev() { if local.name == name { if local.depth.is_none() { panic!("Can't read local variable in its own initializer."); } return Some(i as u8); } } None }
Finally, we tweak
define_variable
so that when executing this module for a local variable, we invokemark_initialized
to set the scope depth of local, a proxy for a fully initialized variable. Now, our local variable is fully initialized and ready for use.
Next, we add the OpGetLocal
and OpSetLocal
opcodes to our execution loop, as shown below. At runtime the operands to these opcodes give the slox index of the referenced local variable and we can either set the local at the index to the value at the top of the stack if we are writing to the local or push the value to the top of the stack if we are reading.
OpCode::OpSetLocal => {
let slot = read_instr!(self);
let slot = slot as u8;
self.stack[slot as usize] = peek_stack!(self);
},
OpCode::OpGetLocal => {
let slot = read_instr!(self);
let slot = slot as u8;
push_stack!(self, self.stack[slot as usize].clone());
}
With that done, our compiler and bytecode interpreter now support print statements and variables so we are able to execute code like below.
let v = "hello world";
print v;
Code for all these is available on the Sox github repo bytecode vm.