Chapter 9 Optimization of the BOL representation
Most optimizations in the Moby compiler are performed on the BOL
representation described in Chapter 6.
Currently, the optimizer performs the following phases:
-
Initial census
- First contraction
- Eta splitting
- Second contraction
- Useless variable elimination
- Small constant copying
- Third contraction
- Cluster conversion
- Unit contraction
- Allocation checks
9.1 Variable substitutions
9.2 Census
The Census module provides code to compute
information about the number of uses of variables.
This information is used by the various transformation phases.
Specifically, the census computes a use count for every
BOL variable, which reflects the number of non-binding
occurances of the variable, and an application count
for every variable that is bound as a function or continuation.
The Census module also provides functions for
adjusting the census data when code is eliminated or
replicated.
9.3 Denesting
We often find it useful to introduce extra levels of block (or let) nesting
when transforming BOL.
Unfortunately, such extra structure hides data dependencies from simple
analysis, such as that used by the contraction phase (see
Section 9.4).
For example, consider the following BOL fragment:
{
let a0 = {
let b1 = "blah"
let x3 = {
let y4 = { let z5 = True in (z5) }
in y4
}
in x3
}
in a0
}
Without data-flow analysis, the fact that a0 is True is
obscured by the nesting of blocks.
The denesting transformation flattens unnecessary block-nesting.1
For example, the above fragment is reduced to the following by
the denesting phase:
{
let b1 = "blah"
let z5 = True
let y4 = z5
let x3 = y4
let a0 = x3
in a0
}
The contraction phase can easily reduce this fragment to
{
let z5 = True
in z5
}
in one pass.
The denesting transformation is implemented in the BOLUtil module
(file bol-util.sml) and takes time linear in the size of the expression.
It can be either applied to an entire expression or just to the top of
an expression.
For the Mandelbrot program (about 50 lines of Moby code), applying the
denesting transformation prior to contraction eliminated 80 let-float
transformations and reduced the number of contraction iterations over
two phases from 9+3 to 2+2.
9.4 Contraction
The Contract module implements a collection of transformations
that are guaranteed to shrink the size of the program.
9.5 Cross-module inlining
9.6 Eta splitting
The EtaSplit module applies h-expansion
to escaping functions that have known call sites.
This transformation is useful, because it turns known calls to
escaping functions into known calls to known functions, which
allows useless variable elimination and specializing of calling
conventions.
The h-expansion transformation is only applied to those
escaping functions that have known call sites.
Recall that escaping functions are those that either are marked
as exported or have a use count that is greater than their
application count.
If a function has a non-zero application count and is escaping, then
we split it into two functions.
9.7 Useless varible elimination
The Useless structure is responsible for removing variables
that do not contribute to the computation (even though they may be
used).
This phase proceeds in two steps.
First, we compute the set of useful variables of the program.
A variable is defined to be useful if it satisfies one of the
following conditions:
-
If it is an applied function or thrown continuation.
- If it is the argument to an unknown or C function call.
- If it is the argument to a primitive operation that has a side
effect.
- If it is the argument to a FieldPut operation.
- If it is returned from a function.
- If it is the argument to a right-hand side expression that
computes a useful result.
- If it is the result of a right-hand side expression and the
corresponding left-hand side variable is useful.
- If is the argument to a known function or continuation, and the
corresponding parameter is useful.
The last three of these conditions propagate usefulness backwards through
the dataflow and require that we iterate the analysis until a fixed point
is reached.
Once the set of useful variables has been computed, we can transform the
program to eliminate useless variable (i.e., those not in the set).
This transformation is implemented by walking the tree looking for binding
occurances of useless variables and eliminating them.
9.8 Small constant copying
Because the BOL representation does requires that all operands be variables,
normal optimizations may result in sharing of small constants (i.e., binding
a BOL variable to a small constant that is used in multiple contexts).
This situation can result in poor code, since shared small constants will not
be used as immediate operands in the generated machine instructions.
A related problem is when a small constant appears as a free variable in
a function.
To address these problems, the constant copying phase replicates variables that
are bound to small constants at the site of their use.
For example, consider the following BOL code fragment:
let x : int = 1
fun f (y : int) {
let t = y + x
ret t
}
let z = f(x)
...
The constant copying phase will rewrite this fragment as follows:
fun f (y : int) {
let x1 : int = 1
let t = y + x1
ret t
}
let x2 : int = 1
let z = f(x2)
...
9.9 Cluster conversion
The cluster phase takes care of both closure conversion and cluster merging,
as well as the LCPS transformation [].
It works by first performing a series of analyses to determine where
CPS splits are needed, how to map functions to clusters, closure representation,
etc..
It then uses the information collected during these analyses to drive a single
transformation phase that produces the BOL cluster representation.
- 1
-
It is essentially a restricted form of the let-floating transformation
described by Peyton Jones, Paqrtain and Santos [PPS96].