Sunday, January 11, 2026

Nand2Tetris: The Virtual Machine Translator

Over the winter break, I decided to resume my work with Nand2Tetris. It's a fantastic book and set of projects by Noam Nisan and Shimon Schoken that covers the full stack of personal computing. Readers start by designing logic gates (including NAND gates) and end with programming a video game (like Tetris). Topics include digital circuits, micro-architecture, machine language, assembly, virtual machines, and compilers. The book is well written with many clear illustrations that are very helpful for explaining details like the processor design and stack machine. The projects are designed so that they each have a limited scope and can each be easily completed within a few days. They also build on top of each other seamlessly. It's very satisfying to see exactly how one layer of the computing stack contributes to the next layer. The software tools provided for testing and debugging work reasonably well. I only noticed a minor bug in the CPU emulator (shown below) when animations are enabled. I would encourage anyone with only basic programming knowledge and an interest in computer architecture to try it out.

Screenshot of the Nand2Tetris CPU emulator. It includes listings of instruction and data memory and the test script.

  Personally, I've just finished reading about and implementing the virtual machine (VM) translator (chapter 8). This is the back-end part of a high-level compiler that translates VM programs into assembly. It's responsible for managing the different memory segments (local, static, pointers, and arguments) and moving data between them and the global stack. Function calls save state to the stack. State is restored when the function returns. Basic logical, arithmetic, and branch operations are also supported. Because of my experience with Chisel, I decided to write the translator in Scala, which is the underlying language of Chisel. Scala is meant for functional code, so most of the variables are immutable and recursion is heavily used. All functions are tail-recursive so that they can be optimized by the compiler to not use too much stack space. The assembly code emitted by the translator is annotated with plenty of comments to help with debugging. Comments include original VM commands, PC values, and notes for function call and return steps. The code for the translator is here.

  One difficulty I had with the translator was the bootstrap code. This code is inserted at the very beginning of the output assembly (at address 0, the reset vector) to initialize the global stack pointer and jump to the Sys.init() function. Initially, my code used an unconditional branch instruction (JMP) to start executing Sys.init(). However, several of the translator tests would fail because the stack pointer value was lower than expected, and the final return values were therefore at the wrong addresses. If I changed the bootstrap program to instead call Sys.init() like a normal function, then the tests would pass. This is because a function call saves some state to the stack before executing the function. This additional state increases the stack pointer value. This seems unnecessary to me, because it not expected that Sys.init() will ever return. Essentially, there is wasted space at the bottom of the stack. A simple jump would be more efficient (see below).

  I also sometimes wonder if more industry standards can be leveraged instead of the bespoke formats and languages used only for Nand2Tetris. Gate-level Verilog could be used to describe the hardware, the processor architecture could be a basic integer RISC-V implementation, and the high-level language could be a subset of C, C++, Java, or Scala. Using popular standards would increase the value of knowledge gained from reading the book and working on the projects. These skills would be more easily applied to an actual job in the field. A wider range of software and hardware tools could also be used for design, simulation, and implementation of the projects. The stated reason why standards aren't used is that the design would be much more complicated. However, I'm inclined to investigate further to see if the significant potential benefits of standards can be leveraged without making the material too difficult for beginners.

No comments:

Post a Comment