首先,简化器在几乎所有阶段之间运行。这使得编写许多优化变得更容易。例如,在实现许多优化时,它们只需创建重写规则来传播更改,而不必手动执行。简化器包含许多简单的优化,包括内联和融合。我所知道的主要限制是GHC拒绝内联递归函数,并且必须正确命名才能进行融合。
Specialise
The basic idea of specialization is to remove polymorphism and overloading by identifying places where the function is called and creating versions of the function that aren't polymorphic - they are specific to the types they are called with. You can also tell the compiler to do this with the SPECIALISE
pragma. As an example, take a factorial function:
fac :: (Num a, Eq a) => a -> a
fac 0 = 1
fac n = n * fac (n - 1)
As the compiler doesn't know any properties of the multiplication that is to be used, it cannot optimize this at all. If however, it sees that it is used on an Int
, it now can create a new version, differing only in the type:
fac_Int :: Int -> Int
fac_Int 0 = 1
fac_Int n = n * fac_Int (n - 1)
Next, rules mentioned below can fire, and you end up with something working on unboxed Int
s, which is much faster than the original. Another way to look at specialisation is partial application on type class dictionaries and type variables.
The source here has a load of notes in it.
Float out
EDIT: I apparently misunderstood this before. My explanation has completely changed.
The basic idea of this is to move computations that shouldn't be repeated out of functions. For example, suppose we had this:
\x -> let y = expensive in x+y
In the above lambda, every time the function is called, y
is recomputed. A better function, which floating out produces, is
let y = expensive in \x -> x+y
To facilitate the process, other transformations may be applied. For example, this happens:
\x -> x + f 2
\x -> x + let f_2 = f 2 in f_2
\x -> let f_2 = f 2 in x + f_2
let f_2 = f 2 in \x -> x + f_2
Again, repeated computation is saved.
The source is very readable in this case.
At the moment bindings between two adjacent lambdas are not floated. For example, this does not happen:
\x y -> let t = x+x in ...
going to
\x -> let t = x+x in \y -> ...
Float inwards
Quoting the source code,
The main purpose of floatInwards
is floating into branches of a case, so that we don't allocate things, save them on the stack, and then discover that they aren't needed in the chosen branch.
As an example, suppose we had this expression:
let x = big in
case v of
True -> x + 1
False -> 0
If v
evaluates to False
, then by allocating x
, which is presumably some big thunk, we have wasted time and space. Floating inwards fixes this, producing this:
case v of
True -> let x = big in x + 1
False -> let x = big in 0
, which is subsequently replaced by the simplifier with
case v of
True -> big + 1
False -> 0
This paper, although covering other topics, gives a fairly clear introduction. Note that despite their names, floating in and floating out don't get in an infinite loop for two reasons:
- Float in floats lets into
case
statements, while float out deals with functions.
- There is a fixed order of passes, so they shouldn't be alternating infinitely.
Demand analysis
Demand analysis, or strictness analysis is less of a transformation and more, like the name suggests, of an information gathering pass. The compiler finds functions that always evaluate their arguments (or at least some of them), and passes those arguments using call-by-value, instead of call-by-need. Since you get to evade the overheads of thunks, this is often much faster. Many performance problems in Haskell arise from either this pass failing, or code simply not being strict enough. A simple example is the difference between using foldr
, foldl
, and foldl'
to sum a list of integers - the first causes stack overflow, the second causes heap overflow, and the last runs fine, because of strictness. This is probably the easiest to understand and best documented of all of these. I believe that polymorphism and CPS code often defeat this.
Worker Wrapper binds
The basic idea of the worker/wrapper transformation is to do a tight loop on a simple structure, converting to and from that structure at the ends. For example, take this function, which calculates the factorial of a number.
factorial :: Int -> Int
factorial 0 = 1
factorial n = n * factorial (n - 1)
Using the definition of Int
in GHC, we have
factorial :: Int -> Int
factorial (I# 0#) = I# 1#
factorial (I# n#) = I# (n# *# case factorial (I# (n# -# 1#)) of
I# down# -> down#)
Notice how the code is covered in I#
s? We can remove them by doing this:
factorial :: Int -> Int
factorial (I
factorial
factorial
factorial
Although this specific example could have also been done by SpecConstr, the worker/wrapper transformation is very general in the things it can do.
Common sub-expression
This is another really simple optimization that is very effective, like strictness analysis. The basic idea is that if you have two expressions that are the same, they will have the same value. For example, if fib
is a Fibonacci number calculator, CSE will transform
fib x + fib x
into
let fib_x = fib x in fib_x + fib_x
which cuts the computation in half. Unfortunately, this can occasionally get in the way of other optimizations. Another problem is that the two expressions have to be in the same place and that they have to be syntactically the same, not the same by value. For example, CSE won't fire in the following code without a bunch of inlining:
x = (1 + (2 + 3)) + ((1 + 2) + 3)
y = f x
z = g (f x) y
However, if you compile via llvm, you may get some of this combined, due to its Global Value Numbering pass.
Liberate case
This seems to be a terribly documented transformation, besides the fact that it can cause code explosion. Here is a reformatted (and slightly rewritten) version of the little documentation I found:
This module walks over Core
, and looks for case
on free variables. The criterion is: if there is a case
on a free variable on the route to the recursive call, then the recursive call is replaced with an unfolding. For example, in
f = \ t -> case v of V a b -> a : f t
the inner f
is replaced. to make
f = \ t -> case v of V a b -> a : (letrec f = \ t -> case v of V a b -> a : f t in f) t
Note the need for shadowing. Simplifying, we get
f = \ t -> case v of V a b -> a : (letrec f = \ t -> a : f t in f t)
This is better code, because a
is free inside the inner letrec
, rather than needing projection from v
. Note that this deals with free variables, unlike SpecConstr, which deals with arguments that are of known form.
See below for more information about SpecConstr.
SpecConstr - this transforms programs like
f (Left x) y = somthingComplicated1
f (Right x) y = somethingComplicated2
into
f_Left x y = somethingComplicated1
f_Right x y = somethingComplicated2
{-# INLINE f #-}
f (Left x) = f_Left x
f (Right x) = f_Right x
As an extended example, take this definition of last
:
last [] = error "last: empty list"
last (x:[]) = x
last (x:x2:xs) = last (x2:xs)
We first transform it to
last_nil = error "last: empty list"
last_cons x [] = x
last_cons x (x2:xs) = last (x2:xs)
{-# INLINE last #-}
last [] = last_nil
last (x : xs) = last_cons x xs
Next, the simplifier runs, and we have
last_nil = error "last: empty list"
last_cons x [] = x
last_cons x (x2:xs) = last_cons x2 xs
{-# INLINE last #-}
last [] = last_nil
last (x : xs) = last_cons x xs
Note that the program is now faster, as we are not repeatedly boxing and unboxing the front of the list. Also note that the inlining is crucial, as it allows the new, more efficient definitions to actually be used, as well as making recursive definitions better.
SpecConstr is controlled by a number of heuristics. The ones mentioned in the paper are as such:
- The lambdas are explicit and the arity is
a
.
- The right hand side is "sufficiently small," something controlled by a flag.
- The function is recursive, and the specializable call is used in the right hand side.
- All of the arguments to the function are present.
- At least one of the arguments is a constructor application.
- That argument is case-analysed somewhere in the function.
However, the heuristics have almost certainly changed. In fact, the paper mentions an alternative sixth heuristic:
Specialise on an argument x
only if x
is only scrutinised by a case
, and is not passed to an ordinary function, or returned as part of the result.