A Register-Machine Simulator
In order to gain a good understanding of the design of register machines, we must test the machines we design to see if they perform as expected. One way to test a design is to hand-simulate the operation of the controller, as in Exercise 5-5. But this is extremely tedious for all but the simplest machines. In this section we construct a simulator for machines described in the register-machine language. The simulator is a Scheme program with four interface procedures. The first uses a description of a register machine to construct a model of the machine (a data structure whose parts correspond to the parts of the machine to be simulated), and the other three allow us to simulate the machine by manipulating the model:
constructs and returns a model of the machine with the given registers, operations, and controller.
stores a value in a simulated register in the given machine.
returns the contents of a simulated register in the given machine.
simulates the execution of the given machine, starting from the beginning of the controller sequence and stopping when it reaches the end of the sequence.
As an example of how these procedures are used, we can define gcd-machine to be a model of the GCD machine of section 5.1.1 as follows:
The first argument to make-machine is a list of register names. The next argument is a table (a list of two-element lists) that pairs each operation name with a Scheme procedure that implements the operation (that is, produces the same output value given the same input values). The last argument specifies the controller as a list of labels and machine instructions, as in section 5.1.
To compute GCDs with this machine, we set the input registers, start the machine, and examine the result when the simulation terminates:
This computation will run much more slowly than a gcd procedure written in Scheme, because we will simulate low-level machine instructions, such as assign, by much more complex operations.
Exercise 5.7: Use the simulator to test the machines you designed in Exercise 5-4.
5.2.1 The Machine Model
The machine model generated by make-machine is represented as a procedure with local state using the message-passing techniques developed in Chapter 3. To build this model, make-machine begins by calling the procedure make-new-machine to construct the parts of the machine model that are common to all register machines. This basic machine model constructed by make-new-machine is essentially a container for some registers and a stack, together with an execution mechanism that processes the controller instructions one by one.
Make-machine then extends this basic model (by sending it messages) to include the registers, operations, and controller of the particular machine being defined. First it allocates a register in the new machine for each of the supplied register names and installs the designated operations in the machine. Then it uses an assembler (described below in section 5.2.2) to transform the controller list into instructions for the new machine and installs these as the machine's instruction sequence. Make-machine returns as its value the modified machine model.
Registers
We will represent a register as a procedure with local state, as in Chapter 3. The procedure make-register creates a register that holds a value that can be accessed or changed:
The following procedures are used to access registers:
The stack
We can also represent a stack as a procedure with local state. The procedure make-stack creates a stack whose local state consists of a list of the items on the stack. A stack accepts requests to push an item onto the stack, to pop the top item off the stack and return it, and to initialize the stack to empty.
The following procedures are used to access stacks:
The basic machine
The make-new-machine procedure, shown in Figure 5-13, constructs an object whose local state consists of a stack, an initially empty instruction sequence, a list of operations that initially contains an operation to initialize the stack, and a register table that initially contains two registers, named flag and pc (for ``program counter''). The internal procedure allocate-register adds new entries to the register table, and the internal procedure lookup-register looks up registers in the table.
The flag register is used to control branching in the simulated machine. Test instructions set the contents of flag to the result of the test (true or false). Branch instructions decide whether or not to branch by examining the contents of flag.
The pc register determines the sequencing of instructions as the machine runs. This sequencing is implemented by the internal procedure execute. In the simulation model, each machine instruction is a data structure that includes a procedure of no arguments, called the instruction execution procedure , such that calling this procedure simulates executing the instruction. As the simulation runs, pc points to the place in the instruction sequence beginning with the next instruction to be executed. Execute gets that instruction, executes it by calling the instruction execution procedure, and repeats this cycle until there are no more instructions to execute (i.e., until pc points to the end of the instruction sequence).
Figure 5.13: The make-new-machine procedure, which implements the basic machine model. @smalllisp (define (make-new-machine) (let ((pc (make-register 'pc)) (flag (make-register 'flag)) (stack (make-stack)) (the-instruction-sequence '())) (let ((the-ops (list (list 'initialize-stack (lambda () (stack 'initialize))))) (register-table (list (list 'pc pc) (list 'flag flag)))) (define (allocate-register name) (if (assoc name register-table) (error "Multiply defined register: " name) (set! register-table (cons (list name (make-register name)) register-table))) 'register-allocated) (define (lookup-register name) (let ((val (assoc name register-table))) (if val (cadr val) (error "Unknown register:" name)))) (define (execute) (let ((insts (get-contents pc))) (if (null? insts) 'done (begin ((instruction-execution-proc (car insts))) (execute))))) (define (dispatch message) (cond ((eq? message 'start) (set-contents! pc the-instruction-sequence) (execute)) ((eq? message 'install-instruction-sequence) (lambda (seq) (set! the-instruction-sequence seq))) ((eq? message 'allocate-register) allocate-register) ((eq? message 'get-register) lookup-register) ((eq? message 'install-operations) (lambda (ops) (set! the-ops (append the-ops ops)))) ((eq? message 'stack) stack) ((eq? message 'operations) the-ops) (else (error "Unknown request -- MACHINE" message)))) dispatch))) @end smalllisp
As part of its operation, each instruction execution procedure modifies pc to indicate the next instruction to be executed. Branch and goto instructions change pc to point to the new destination. All other instructions simply advance pc, making it point to the next instruction in the sequence. Observe that each call to execute calls execute again, but this does not produce an infinite loop because running the instruction execution procedure changes the contents of pc.
Make-new-machine returns a dispatch procedure that implements message-passing access to the internal state. Notice that starting the machine is accomplished by setting pc to the beginning of the instruction sequence and calling execute.
For convenience, we provide an alternate procedural interface to a machine's start operation, as well as procedures to set and examine register contents, as specified at the beginning of section 5.2:
These procedures (and many procedures in sections 5-2-2 and 5-2-3) use the following to look up the register with a given name in a given machine:
5.2.2 The Assembler
The assembler transforms the sequence of controller expressions for a machine into a corresponding list of machine instructions, each with its execution procedure. Overall, the assembler is much like the evaluators we studied in Chapter 4---there is an input language (in this case, the register-machine language) and we must perform an appropriate action for each type of expression in the language.
The technique of producing an execution procedure for each instruction is just what we used in section 4.1.7 to speed up the evaluator by separating analysis from runtime execution. As we saw in Chapter 4, much useful analysis of Scheme expressions could be performed without knowing the actual values of variables. Here, analogously, much useful analysis of register-machine-language expressions can be performed without knowing the actual contents of machine registers. For example, we can replace references to registers by pointers to the register objects, and we can replace references to labels by pointers to the place in the instruction sequence that the label designates.
Before it can generate the instruction execution procedures, the assembler must know what all the labels refer to, so it begins by scanning the controller text to separate the labels from the instructions. As it scans the text, it constructs both a list of instructions and a table that associates each label with a pointer into that list. Then the assembler augments the instruction list by inserting the execution procedure for each instruction.
The assemble procedure is the main entry to the assembler. It takes the controller text and the machine model as arguments and returns the instruction sequence to be stored in the model. Assemble calls extract-labels to build the initial instruction list and label table from the supplied controller text. The second argument to extract-labels is a procedure to be called to process these results: This procedure uses update-insts! to generate the instruction execution procedures and insert them into the instruction list, and returns the modified list.
Extract-labels takes as arguments a list text (the sequence of controller instruction expressions) and a receive procedure. Receive will be called with two values: (1) a list insts of instruction data structures, each containing an instruction from text; and (2) a table called labels, which associates each label from text with the position in the list insts that the label designates.
Extract-labels works by sequentially scanning the elements of the text and accumulating the insts and the labels. If an element is a symbol (and thus a label) an appropriate entry is added to the labels table. Otherwise the element is accumulated onto the insts list.@footnote{Using the receive procedure here is a way to get extract-labels to effectively return two values---labels and insts---without explicitly making a compound data structure to hold them. An alternative implementation, which returns an explicit pair of values, is
which would be called by assemble as follows:
You can consider our use of receive as demonstrating an elegant way to return multiple values, or simply an excuse to show off a programming trick. An argument like receive that is the next procedure to be invoked is called a ``continuation.'' Recall that we also used continuations to implement the backtracking control structure in the amb evaluator in section 4.3.3.}
Update-insts! modifies the instruction list, which initially contains only the text of the instructions, to include the corresponding execution procedures:
The machine instruction data structure simply pairs the instruction text with the corresponding execution procedure. The execution procedure is not yet available when extract-labels constructs the instruction, and is inserted later by update-insts!.
The instruction text is not used by our simulator, but it is handy to keep around for debugging (see Exercise 5-16).
Elements of the label table are pairs:
Entries will be looked up in the table with
Exercise 5.8: The following register-machine code is ambiguous, because the label here is defined more than once:
With the simulator as written, what will the contents of register a be when control reaches there? Modify the extract-labels procedure so that the assembler will signal an error if the same label name is used to indicate two different locations.
5.2.3 Generating Execution Procedures for Instructions
The assembler calls make-execution-procedure to generate the execution procedure for an instruction. Like the analyze procedure in the evaluator of section 4.1.7, this dispatches on the type of instruction to generate the appropriate execution procedure.
For each type of instruction in the register-machine language, there is a generator that builds an appropriate execution procedure. The details of these procedures determine both the syntax and meaning of the individual instructions in the register-machine language. We use data abstraction to isolate the detailed syntax of register-machine expressions from the general execution mechanism, as we did for evaluators in section 4.1.2, by using syntax procedures to extract and classify the parts of an instruction.
Assign instructions
The make-assign procedure handles assign instructions:
Make-assign extracts the target register name (the second element of the instruction) and the value expression (the rest of the list that forms the instruction) from the assign instruction using the selectors
The register name is looked up with get-register to produce the target register object. The value expression is passed to make-operation-exp if the value is the result of an operation, and to make-primitive-exp otherwise. These procedures (shown below) parse the value expression and produce an execution procedure for the value. This is a procedure of no arguments, called value-proc, which will be evaluated during the simulation to produce the actual value to be assigned to the register. Notice that the work of looking up the register name and parsing the value expression is performed just once, at assembly time, not every time the instruction is simulated. This saving of work is the reason we use execution procedures, and corresponds directly to the saving in work we obtained by separating program analysis from execution in the evaluator of section 4.1.7.
The result returned by make-assign is the execution procedure for the assign instruction. When this procedure is called (by the machine model's execute procedure), it sets the contents of the target register to the result obtained by executing value-proc. Then it advances the pc to the next instruction by running the procedure
Advance-pc is the normal termination for all instructions except branch and goto.
Test, branch, and goto instructions
Make-test handles test instructions in a similar way. It extracts the expression that specifies the condition to be tested and generates an execution procedure for it. At simulation time, the procedure for the condition is called, the result is assigned to the flag register, and the pc is advanced:
The execution procedure for a branch instruction checks the contents of the flag register and either sets the contents of the pc to the branch destination (if the branch is taken) or else just advances the pc (if the branch is not taken). Notice that the indicated destination in a branch instruction must be a label, and the make-branch procedure enforces this. Notice also that the label is looked up at assembly time, not each time the branch instruction is simulated.
A goto instruction is similar to a branch, except that the destination may be specified either as a label or as a register, and there is no condition to check---the pc is always set to the new destination.
Other instructions
The stack instructions save and restore simply use the stack with the designated register and advance the pc:
The final instruction type, handled by make-perform, generates an execution procedure for the action to be performed. At simulation time, the action procedure is executed and the pc advanced.
Execution procedures for subexpressions
The value of a reg, label, or const expression may be needed for assignment to a register (make-assign) or for input to an operation (make-operation-exp, below). The following procedure generates execution procedures to produce values for these expressions during the simulation:
The syntax of reg, label, and const expressions is determined by
Assign, perform, and test instructions may include the application of a machine operation (specified by an op expression) to some operands (specified by reg and const expressions). The following procedure produces an execution procedure for an ``operation expression''---a list containing the operation and operand expressions from the instruction:
The syntax of operation expressions is determined by
Observe that the treatment of operation expressions is very much like the treatment of procedure applications by the analyze-application procedure in the evaluator of section 4.1.7 in that we generate an execution procedure for each operand. At simulation time, we call the operand procedures and apply the Scheme procedure that simulates the operation to the resulting values. The simulation procedure is found by looking up the operation name in the operation table for the machine:
Exercise 5.9: The treatment of machine operations above permits them to operate on labels as well as on constants and the contents of registers. Modify the expression-processing procedures to enforce the condition that operations can be used only with registers and constants.
Exercise 5.10: Design a new syntax for register-machine instructions and modify the simulator to use your new syntax. Can you implement your new syntax without changing any part of the simulator except the syntax procedures in this section?
Exercise 5.11: When we introduced save and restore in section 5.1.4, we didn't specify what would happen if you tried to restore a register that was not the last one saved, as in the sequence
There are several reasonable possibilities for the meaning of restore:
- (restore y) puts into y the last value saved on the stack, regardless of what register that value came from. This is the way our simulator behaves. Show how to take advantage of this behavior to eliminate one instruction from the Fibonacci machine of section 5.1.4 (Figure 5-12).
- (restore y) puts into y the last value saved on the stack, but only if that value was saved from y; otherwise, it signals an error. Modify the simulator to behave this way. You will have to change save to put the register name on the stack along with the value.
- (restore y) puts into y the last value saved from y regardless of what other registers were saved after y and not restored. Modify the simulator to behave this way. You will have to associate a separate stack with each register. You should make the initialize-stack operation initialize all the register stacks.
Exercise 5.12: The simulator can be used to help determine the data paths required for implementing a machine with a given controller. Extend the assembler to store the following information in the machine model:
- a list of all instructions, with duplicates removed, sorted by instruction type (assign, goto, and so on);
- a list (without duplicates) of the registers used to hold entry points (these are the registers referenced by goto instructions);
- a list (without duplicates) of the registers that are saved or restored;
- for each register, a list (without duplicates) of the sources from which it is assigned (for example, the sources for register val in the factorial machine of Figure 5-11 are (const 1) and ((op *) (reg n) (reg val))).
Extend the message-passing interface to the machine to provide access to this new information. To test your analyzer, define the Fibonacci machine from Figure 5-12 and examine the lists you constructed.
Exercise 5.13: Modify the simulator so that it uses the controller sequence to determine what registers the machine has rather than requiring a list of registers as an argument to make-machine. Instead of pre-allocating the registers in make-machine, you can allocate them one at a time when they are first seen during assembly of the instructions.
5.2.4 Monitoring Machine Performance
Simulation is useful not only for verifying the correctness of a proposed machine design but also for measuring the machine's performance. For example, we can install in our simulation program a ``meter'' that measures the number of stack operations used in a computation. To do this, we modify our simulated stack to keep track of the number of times registers are saved on the stack and the maximum depth reached by the stack, and add a message to the stack's interface that prints the statistics, as shown below. We also add an operation to the basic machine model to print the stack statistics, by initializing the-ops in make-new-machine to
Here is the new version of make-stack:
Exercise 5-15 through Exercise 5-19 describe other useful monitoring and debugging features that can be added to the register-machine simulator.
Exercise 5.14: Measure the number of pushes and the maximum stack depth required to compute n! for various small values of n using the factorial machine shown in Figure 5-11. From your data determine formulas in terms of n for the total number of push operations and the maximum stack depth used in computing n! for any n > 1. Note that each of these is a linear function of n and is thus determined by two constants. In order to get the statistics printed, you will have to augment the factorial machine with instructions to initialize the stack and print the statistics. You may want to also modify the machine so that it repeatedly reads a value for n, computes the factorial, and prints the result (as we did for the GCD machine in Figure 5-4), so that you will not have to repeatedly invoke get-register-contents, set-register-contents!, and start.
Exercise 5.15: Add instruction counting to the register machine simulation. That is, have the machine model keep track of the number of instructions executed. Extend the machine model's interface to accept a new message that prints the value of the instruction count and resets the count to zero.
Exercise 5.16: Augment the simulator to provide for instruction tracing . That is, before each instruction is executed, the simulator should print the text of the instruction. Make the machine model accept trace-on and trace-off messages to turn tracing on and off.
Exercise 5.17: Extend the instruction tracing of Exercise 5-16 so that before printing an instruction, the simulator prints any labels that immediately precede that instruction in the controller sequence. Be careful to do this in a way that does not interfere with instruction counting (Exercise 5-15). You will have to make the simulator retain the necessary label information.
Exercise 5.18: Modify the make-register procedure of section 5.2.1 so that registers can be traced. Registers should accept messages that turn tracing on and off. When a register is traced, assigning a value to the register should print the name of the register, the old contents of the register, and the new contents being assigned. Extend the interface to the machine model to permit you to turn tracing on and off for designated machine registers.
Exercise 5.19: Alyssa P. Hacker wants a breakpoint feature in the simulator to help her debug her machine designs. You have been hired to install this feature for her. She wants to be able to specify a place in the controller sequence where the simulator will stop and allow her to examine the state of the machine. You are to implement a procedure
that sets a breakpoint just before the nth instruction after the given label. For example,
installs a breakpoint in gcd-machine just before the assignment to register a. When the simulator reaches the breakpoint it should print the label and the offset of the breakpoint and stop executing instructions. Alyssa can then use get-register-contents and set-register-contents! to manipulate the state of the simulated machine. She should then be able to continue execution by saying
She should also be able to remove a specific breakpoint by means of
or to remove all breakpoints by means of
Notes
Based on Structure and Interpretation of Computer Programs, a work at https://mitpress.mit.edu/sicp/.