Compiler optimizations

Introduction
Intermediate representation
Basic block
Optimizations
References

Introduction

Optimizing compilers perform optimizations to improve a program’s resource utilization. Generally the resource being optimized for is CPU time, but specialist compilers exist that optimize for other resources (e.g. code size, memory usage, disk accesses, etc.).

Optimization involves many subproblems that are computationally intractable. Therefore, heuristics are often used during optimization.

Typically, optimizations are run on an IR (Intermediate Representation) before the IR is passed to the code generator [1, P. 505].

Intermediate representation

IR (Intermediate Representation) is a language between the source code and the target language. It provides a layer of abstraction that:

Contains more details than the source
Contains less details than the target

An IR is designed to make processing of a program easier (e.g. optimization and translation). Some compilers translate through a series of intermediate languages during the compilation pipeline.

Linear IRs

Linear IRs consist of sequentially-executed instructions. Linear IRs often resemble assembly code for an abstract machine.

Example LLVM IR:

define i32 @main() #0 {
  %1 = alloca i32, align 4
  store i32 0, i32* %1, align 4
  ret i32 3
}

Three-address code

3AC (Three-Address Code) is a linear IR. In 3AC there is at most one operator on the right side of an instruction.

A source-language expression like x - y * z might be translated into a sequence of 3AC instructions:

t1 = y * z
t2 = x - t1

[1, P. 363]

3AC includes addresses and instructions.

A 3AC address can be one of:

A name (source-code names)
A constant
A compiler-generated temporary

Operators are either constants or registers.

Some common 3AC instruction forms:

Assignment (binary: x = y op z, unary: x = op y)
Copy instructions (x = y)
Unconditional jump (goto L, where L is a 3AC instruction with label L)
Conditional jump (if x goto L)
Conditional jump with relational operator (if x relop y goto L, relational operators: <, ==, etc.)
Indexed copy instructions (x = y[i], y[i] = x)
Address and pointer assignments (x = &y, x = *y, *x = y)
Procedure calls (implemented using param x for parameter)

[1, Pp. 364-5]

3AC instructions can be represented as quadruples <op, arg_1, arg_2, result> [1, P. 366].

Stack-machine code

Stack-machine code is a linear IR that models the behavior of a stack machine. It is a form of one-address code.

Most operations take their operands from the stack. For example, a multiply instruction would remove the top 2 instructions from the stack, multiply them, and push the result back to the stack.

Example stack machine code:

push 3
push b
multiply
push a
subtract

Many interpreters execute stack-machine bytecode on a virtual stack machine (e.g. CPython, JVM).

Graphical IRs

Graphical IRs represent source code as a graph.

Graphical IRs can be trees:

Parse trees—graphical representation of derivation
ASTs—derivation with extraneous nodes removed
DAGs—ASTs where nodes can have multiple parents and identical subtrees are reused

[2, Pp. 226-9]

Control-flow graphs

A control-flow graph represents the possible paths through basic blocks, where a basic block is a sequence of operations that always execute together (unless an operation raises an exception).

Each node represents a basic code block. A directed edge $(B_{i}, B_{j})$ represents a possible transfer of control from $B_{i}$ to $B_{j}$ [2, P. 231].

Control-flow graphs are often used with another IR where the control-flow graph represents the relationships between blocks, and the operations within blocks are represented with another IR (e.g. a linear IR) [2, Pp. 231-2].

Data-dependence graphs

In compilers, a data-dependence graph represents the dependencies between individual instructions.

A data-dependence graph node $n$ represents an operation. An edge $(n_{i}, n_{j})$ represents a definition value $n_{j}$ and an operation $n_{i}$ that uses the value [2, P. 233].

Data dependence graphs are often supplementary data structures built from the definitive IR and discarded after use. They are used for instruction scheduling [2, P. 234].

The edges in a graph represent hard constraints (an operation $n_{j}$ cannot run before operation $n_{i}$ ), the graph creates a partial order, where there are often many sequences that preserve the data dependencies of the graph. This property is exploited by out-of-order processors in order to schedule instruction efficiently [2, P. 233].

Static single-assignment form

SSA (Static single-assignment form) is a property of an IR which requires that each variable is assigned exactly once and that every variable is defined before its use.

The process for transforming ordinary code into SSA involves replacing the target of each assignment with a new variable, and replacing each use of a variable with the version of the variable reaching that point.

Since control flow can’t be predicted in advance, there can be cases where a variable might refer to multiple versions. In this case, a variadic $Φ$ (Phi) function is used. You can read $Φ (o_{1}, o_{2}, . . ., o_{n})$ as “one of either $o_{1}$ , $o_{2}$ , …, or $o_{n}$ ”.

Basic block

A basic block is a sequence of code with no branches within itself, except to the entry, and no branches out, except at the exit.

A control-flow graph is a directed graph where the nodes are basic blocks and an edge $(B_{1}, B_{2})$ exists if execution can pass from the last instruction in $B_{1}$ to the first instruction in $B_{2}$ .

The body of a method (or procedure) can always be represented as a control-flow graph. There is one initial node, and all return nodes are terminal.

Optimizations

For languages like C, there are three optimization levels:

Local optimizations—applied to a basic block.
Global optimizations—applied to a single function, optimized across all basic blocks of that function.
Inter-procedural optimizations—applied across method boundaries.

Commonly, applying optimizations enables new optimizations (e.g. running a copy propagation optimization enables dead code elimination). Optimizing compilers repeat optimizations until no new optimizations are found (or a limit is reached).

Local optimizations

Local optimizations only consider information local to a single basic block.

Common subexpression elimination

Common subexpression elimination is where identical expressions are replaced with a single variable holding the computed value. This optimization is easy to do when the IR is in SSA.

Copy propagation

Copy propagation is where occurrences of direct assignments are replaced with their values. e.g. if y = x then z = 3 + y can become z = 3 + x. Copy propagation can enable dead code elimination and constant folding.

Constant propagation

Similar to copy propagation, constant propagation is the process of substituting values of known constants in expressions. e.g. if x = 14 then y = 3 + x becomes y = 3 + 14.

Dead code elimination

Dead code elimination is the process of removing dead code, where dead code includes unreachable code and dead variables (variables written to but never read).

Constant folding

Constant folding is when the compiler reorganizes and evaluates constant expressions at runtime.

Example: x := 2 + 2 becomes x := 4.

Peephole optimization

Peephole optimization is an optimization technique applied to a short sequence of (usually contiguous) target language instructions. The sequence is known as the peephole.

The optimizer replaces the sequence with another sequence that produces the equivalent result but is faster.

Peephole optimizations are generally written as replacement rules:

i_{1}, . . ., i_{n} \to j_{1}, . . ., j_{n}

Global Dataflow analysis

Dataflow analysis is a variety of techniques that derive information about the flow of data along program execution paths. This enables global optimizations, e.g. global constant propagation [1, P. 597].

In dataflow analysis, a program point is a point in the program that is either before an instruction (the input state of an instruction) or after an instruction (the output state of an instruction). Dataflow analysis must consider all possible paths through program points that a program can take [1, P. 597].

Although Dataflow analysis can be run on program points, it can also be run on the boundaries of basic blocks (requiring less computation).

There are two main forms of dataflow analysis:

Forward flow analysis
Backward flow analysis

In forward flow analysis, the exit state of a program point is a function of the program point’s entry state. In backward flow analysis, the entry state of a program point is a function of the program point’s exit state.

In forward flow analysis you initialize an entry point before running the analysis. In backward flow analysis, you initialize exit points.

In forward flow analysis, the value of a block’s exit ( ${out}_{b}$ ) is:

{out}_{b} = {transfer}_{b} ({in}_{b})

Where $transfer_b$ is an output function of the block $b$ (a transfer function).

And the value of a block’s entry ( ${in}_{b}$ ) is:

{in}_{b} = {join}_{p \in {predecessor}_{b}} ({out}_{p})

Where the join operation combines the exit states of the predecessors of $b$ , yielding the entry state of $b$ .

Each data flow analysis has its own transfer function and join operation.

Backward flow analysis is the inverse:

{in}_{b} = {transfer}_{b} ({out}_{b})

{out}_{b} = {join}_{p \in {successor}_{b}} ({in}_{p})

Reaching definition analysis

Reaching definition analysis is a forward flow analysis that statically determines which definitions may reach a certain point.

In the following example, d2 is a reaching definition for d3 but d1 is not:

d1 : y := 3
d2 : y := 4
d3 : x := y

Reaching definition analysis can be defined as:

{out}_{b} = {gen}_{b} ⋃ ({in}_{b} - {kill}_{b})

{in}_{b} = ⋃_{p \in {predecessor}_{b}} {out}_{p}

Where $g e n_{b}$ is the set of all definitions introduced by $b$ , and $k i l l_{b}$ is the set of all definitions that are overwritten by $b$ .

Liveness analysis

Liveness analysis is a backward dataflow analysis used to calculate whether variables are live at each point in the program.

Liveness analysis can be used during register allocation to determine which registers should be favored [1, Pp. 608-9].

A variable is live at statement $s$ if:

There exists a statement $s^{'}$ that accesses $x$ .
There is a path from $s$ to $s^{'}$ .
The path has no intervening assignment to $x$ .

A dead variable is one that is not live. A statement x = ... is dead code if x is dead, and can therefore be deleted.

Liveness analysis can be defined as:

{in}_{b} = {use}_{b} ⋃ ({out}_{b} - {def}_{b})

{out}_{b} = ⋃_{s \in {successors}_{b}} {in}_{s}

{in}_{exit} = \emptyset

Where $d e f_{b}$ is the set of variables defined in $b$ prior to any use of that variable in $b$ , and $u s e_{b}$ is the set of variables whose values may be used in $b$ prior to any definition of the variable

References

[1] A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, and Tools (2nd Edition). USA: Addison-Wesley Longman Publishing Co., Inc., 2006.
[2] K. D. Cooper and T. Linda, Engineering a Compiler, 2nd ed. Morgan Kaufmann Publishers, 2012.