Techniques for maximizing parallelism and locality within a single loop nest have been presented in the literature[20,25,36]. A number of researchers have also looked at the specific problem of mapping a single loop nest onto parallel machines[15,23,24]. We refer to such loop-level techniques as local analysis. The global analysis is responsible for optimizing parallelism and locality across multiple loop nests.
First, our compiler normalizes the loops and performs loop distribution before executing the decomposition algorithms. The compiler runs a loop fusion pass after decomposition to regroup compatible loop nests[5,10].
Our compiler uses the algorithm developed by Wolf and Lam[25,36] to apply unimodular transforms to find the coarsest granularity of parallelism within a loop nest. This pass leaves the loop nests in a canonical form consisting of a nest of fully permutable loop nests. The nests are as large as possible, starting from the outermost loops. A loop nest is fully permutable if any arbitrary permutation of the loops within the nest is legal. A fully permutable loop nest of depth j can be transformed to get j-1 degrees of parallelism. Our compiler positions the loops in each fully permutable nest such that any parallel loops are outermost. For example, given the following code:
The local analysis would produce the code shown at the top of Figure 1. The keyword forall indicates that the iterations of the loop can be executed in parallel.
For a loop nest of depth l, let be the outermost parallelizable loop (the first loop in the outermost fully permutable loop nest of size greater than 1). Loops are sequential (degenerate fully permutable nests of size 1), thus there must be dependences between iterations of these loops. The local phase is responsible for finding transformations that minimize communication of loops with respect to the outer sequential loops. Parallelizable loops are allocated such that any neighboring loops in the iteration space are neighbors when mapped onto the processor space.