Posted Thu Dec 23, 2021 • Reading time: 10 minutes

The LISP in Our Go

Developer: Mmmm! Goddamn, gc! This is some serious gourmet shit! Usually, me and Vince would be happy with some LLVM right, but he springs this serious GOURMET shit on us! What compiler backend is this?

Go Compiler : Knock it off, devie.

Developer: [pause] What?

Go Compiler : I don’t need you to tell me how f-ing good my internal is, okay? I’m the one who maintains it. I know how good it is. I code the Go stuff because when I build it I want to optimize it. But you know what’s on my mind right now? It AIN’T the Go in my internals, it’s the Lisp in my backend.

Developer: Oh, gc, don’t even worry about that…

Go Compiler : [interupting] No, No, No, No, let me ask you a question. When you came pulling in here, did you notice a sign out in front of my house that said “Lisp Interpreter”?

Developer: gc, you know I ain’t seen no…

Go Compiler : [cutting him off again; getting angry] Did you notice a sign out in front of my house that said “Lisp Interpreter”?

Developer: [pause] No. I didn’t.

Go Compiler : You know WHY you didn’t see that sign?

Developer: Why?

Go Compiler : ‘Cause it ain’t there, ‘cause interpreting Lisp ain’t my f-ing business, that’s why!

~ Pulp Fiction, reportedly

Deep inside the Go git repository there is a collection of files having the .rules extension with LISP-like content. They are located in src/cmd/compile/internal/ssa/gen. What is SSA? Why is it in our compiler? Why does it need LISP? Are LISP-ers taking over the Go compiler? Should gophers be worried?

What is SSA?

The initialism SSA stands for Single Static Assignment. SSA is an intermediate representation (IR) form of the program standing between the high-level, human-readable code and the machine code. A birdeye view of the Go compiler phases shows the source code transformation in this manner:

(source)-lex/parse->(AST)-convert->(SSA)-optimize/generate->(machine code)

As the name implies, SSA ensures every variables is assigned once by giving every variable a unique name once defined. This allows the compiler to track how the variables are used across the codebase. Once we know the journey of our variables across our codebase we know how we can implement various forms of optimnizations. The compiler then takes the SSA representation of the program into passes to optimize the program at each pass. The passes are run in particular order given the dependencies of their various operations.

Albeit naive example, but consider the following code snippet:

func main() {
    var int_16 int16 = 0
    int_16 = 6
    var float_64_1 float64 = 1
    var float_64_a float64 = float64(int_16)
    var float_64_b = float_64_a + float_64_1
    print(float_64_b)
}

The SSA form would look like (ignoring the print function and types):

v_0 = 0 # int_16
v_1 = 6 # int_16
v_2 = 1 # float_64_1
v_3 = v_1 # float_61_a
v_4 = v_3 + v_2 # float_64_b
print(v_4)

The compiler can now optimize the code by taking the code through multi-stage operations. N.B. the stages in this particular example are simplified and fictitious for the sake of the post.

Stage 1

It becomes easier to see the value of v_0 is not used, and the constant value of v_1 is used directly as the value for v_3, then v_2 and v_3 are directly used for v_4. The compiler now knows the opertion assigning v_0 is useless because it is immediately overwritten. It transforms the code to:

v_1 = 6 # int_16
v_2 = 1 # float_64_1
v_3 = v_1 # float_61_a
v_4 = v_3 + v_2 # float_64_b
print(v_4)

Stage 2

The compiler then sees the value of v_3 comes directly from v_1 which is a constant, so it shortcuts the 2-step assignment into a single step by transforming the code thusly:

v_2 = 1 # float_64_1
v_3 = 6 # float_61_a
v_4 = v_3 + v_2 # float_64_b
print(v_4)

Stage 3

At this stage, the compiler now sees v_4 is given constant values propagating from the variables v_2 and v_3. It therefore optimizes the code by shortcutting the operations to:

v_4 = 6 + 1 # float_64_b

Stage 4

Finally, the compiler sees the variable v_4 is assigned the value that is the sum of two constants. Compilers know how to add numbers, so it does and shortcuts the code further to:

v_4 = 7 # float_64_b

Note how 5 lines of operations were reduced to a single operation by applying few rules. Remember the .rules files? This is where they come handy.

Optimization Rules

We can reduce the operations into a set of abstract rules to be applied blindly and arrive at performant, optimized machine code. The optimizations are over the operations on the variables. The variables are placeholders for the humans, so we can focus on the operations as functions taking values and how they interact with each other. Thus the first SSA form of the code snippet from above

v_0 = 0 # int_16
v_1 = 6 # int_16
v_2 = 1 # float_64_1
v_3 = v_1 # float_61_a
v_4 = v_3 + v_2 # float_64_b
v_5 = print(v_4)

is reformulated as (leaving print function as-is for simplicity sake):

v_0 = Constant(0) # int_16
v_1 = Constant(6) # int_16
v_2 = Constant(1) # float_64_1
# for simplicity sake, the value of v_1 is considered constant from v_3's perspective
v_3 = Constant(v_1) # float_61_a; 
v_4 = Addition(v_3, v_2) # float_64_b
v_5 = print(v_4)

The SSA form is still code, and the operations are akin to function calls. The goal of the compiler is to manipualte the code, reshaping it into a more optimized form. The compiler thus needs to recognize particular patterns of the code at hand to subsequently manipulate it accordingly. In other words, the compiler needs to treat the code as data and act on the data to reduce it to a more succinct form.

The notion of code and data being the same thing is known as Homoiconicity. LISP doubles-down on the notion of code-is-data by using s-expressions, which is either an atomic value or a parenthesized list of atomic values. Function calls take the same call by having the function name being in the first item within the parenthesis. Rearranging our function calls into s-expressions turns our code snippet into this form:

v_0 = (Constant 0) # int_16
v_1 = (Constant 6) # int_16
v_2 = (Constant 1) # float_64_1
v_3 = (Constant v_1) # float_61_a; store v_1 in v_3
v_4 = (Addition v_3 v_2) # float_64_b
v_5 = (print v_4)

Let’s ponder the five lines. We can fold the operations of the variables where they are used into their assigned values. Taking a single folding step of folding v_1 into the expression of v_3 delivers this optimized form:

v_0 = (Constant 0) # int_16
v_2 = (Constant 1) # float_64_1
v_3 = (Constant (Constant 6)) # folding v_1 into the v_3 expression
v_4 = (Addition v_3 v_2) # float_64_b
v_5 = (print v_4)

We can now derive the first optimization rule based on a pattern of the code. The rule says:

If a Constant operation has another Constant operation as its argument, then the argument of the inner Constant operation is used as the argument of the outer Cosntant operation and the inner Constant may be disposed.

We may illustrate the transformation of the code as such:

(Constant (Constant [x])) -> (Constant x)

If we teach our compiler the optimization rule, it can look for the pattern in our code and apply them judiciously. Applying the rule to our code results in:

v_0 = (Constant 0) # int_16
v_2 = (Constant 1) # float_64_1
v_3 = (Constant 6) # folding v_1 into the v_3 expression
v_4 = (Addition v_3 v_2) # float_64_b
v_5 = (print v_4)

Applying the earlier variable folding to the current version of the code reduces our program to only 2 lines:

v_0 = (Constant 0)
v_4 = (Addition (Constant 6) (Constant 1))
v_5 = (print v_4)

We can now derive new optimization pattern that says

If an Addition operation takes a Constant operation as either of its argument values, then the argument of the Constant operation may directly substitute the position of the Costant operation as the argument of Addition.

In other words

(Addition (Constant [x]) (Constant [y])) -> (Addition [x] [y])

Applying the same to our code simplifies the code to

v_0 = (Constant 0)
v_4 = (Addition 6 1)
v_5 = (print v_4)

We can still derive one more optimization rule by looking at the code seeing the arguments of Addition are both constant values, which may be replaced by the operation Constant with its argument being the sum of the arguments of the Addition operation. The transformation in s-expression format looks like this

(Addition [x] [y]) -> (Constant [x+y])

Of course we require an s-expression parser that is capable of adding x and y in its engine only to plug the resulting value back into the SSA form of the code. Once our smart compiler applies the rule, our code is now

v_0 = (Constant 0)
v_4 = (Constant 7)
v_5 = (print v_4)

Our smart compiler will have its own dead-code elimination pass in which it will see v_0 not being used anywhere and may be removed, turning our code into this form

v_4 = (Constant 7)
v_5 = (print v_4)

I hope it is clear by now how using SSA, treating code as data, and manipulating the data based on certain known patterns enables more efficient programs, which in our example reduced 6-line program into 2-liner. Our inventory of optimization rules by now is the following 3 rules:

(Constant (Constant [x])) -> (Constant x)
(Addition (Constant [x]) (Constant [y])) -> (Addition [x] [y])
(Addition [x] [y]) -> (Constant [x+y])

The `.rules` Files

The rules we derived in the earlier section were generic and applicable to any codebase. The Go source tree has a collection of files having the extension .rules that contain rules that are similar in essene and more complex of similar format. The format used for the Go compiler is more complex to accommodate more powerful pattern recognition and code manipulation. The syntax is defined in $GOROOT/src/cmd/compile/internal/ssa/gen/rulegen.go. The parsers extends the syntax allowing for special Go expressions to be included in the s-expressions if the Go expressions value type is boolean.

You might recognize the file names are patterned after CPU architectures, except for the files generic.rules, dec.rules, dec64.rules, and decArgs.rules. The latter 3 files are rules to handle the decomposition of Go’s builtin types and compound argument values. The generic.rules file contains, well, generic optimization rules that could apply regardless of the CPU architecture. The rule we derived earlier which converts addition of constant values into Constant operation of sum of the constant values is present in the generic.rules file in a more elaborate form:

(Add8   (Const8 [c])   (Const8 [d]))   => (Const8  [c+d])
(Add16  (Const16 [c])  (Const16 [d]))  => (Const16 [c+d])
(Add32  (Const32 [c])  (Const32 [d]))  => (Const32 [c+d])
(Add64  (Const64 [c])  (Const64 [d]))  => (Const64 [c+d])
(Add32F (Const32F [c]) (Const32F [d])) && c+d == c+d => (Const32F [c+d])
(Add64F (Const64F [c]) (Const64F [d])) && c+d == c+d => (Const64F [c+d])
(AddPtr <t> x (Const64 [c])) => (OffPtr <t> x [c])
(AddPtr <t> x (Const32 [c])) => (OffPtr <t> x [int64(c)])

Note the difference in the names of the operations. Our example used dummy operation names, while the operation names in these files correspond to Go’s instructions which are later mapped to CPU instructions. The per-architecture .rules files have rules relying on specific features of the target architecture. For instance, some CPUs have a single instruction for add-then-multiply rather than being carried in multiple steps. Such optimization rule will be included into its respective {arch}.rules file.

Optimization in Action

The go command has a facility to export an HTML file showing the optimization rules in action and how they transform the code in its SSA form through the various compiler passes. Run GOSSAFUNC=<func name> go build to have an ssa.html file generated in the output directory. Dave Cheney discusses this particular feature in this blogpost titled How to dump the GOSSAFUNC graph for a method. Our example code was in fact extracted from my local experiment file:

package main

func main() {
	var int_16 int16 = 0
	int_16 = 6
	var float_64_1 float64 = 1
	var float_64_a float64 = float64(int_16)
	var float_64_b = float_64_a + float_64_1
	print(float_64_b)
}

You can find the generated HTML file by clicking here or in the Go SSA Playground.

So Does the Go Compiler Have LISP Interpreter?

No, rest assured, it does not. The .rules files are accompanied by .go files which contain metadata about the operations, and the conglamorate of files are used to generate .go files in the parent dirctory with the file name having the format rewrite{arch}.go (again, except for generic, dec, dec64, and decArgs having files but not being CPU architectures). The generated files contain code manipulation as defined by the s-expressions listed in the .rules files.

Therefore, the Go source code tree contains a parser of LISP-like (s-expression) DSL used to generate code-rewrite instrcutions for the compiler, but the parser is not embedded in the Go compiler. The s-expression files and the parser are used in an intermdiate step external to the compiler binary.

Learn More

Watch the GopherCon 2017: Keith Randall - Generating Better Machine Code with SSA, or read the transcript by Sourcegraph.
Read the New SSA Backend for the Go Compiler proposal from 2015.
Introduction to the Go compiler’s SSA backend
Go compiler: SSA optimization rules description language by Iskander Sharipov.