Swm256 is a 500-line program from the SPEC92 benchmark suite. It performs a two-dimensional stencil computation that applies finite-difference methods to solve shallow-water equations. The speedups for swm256 are shown in Figure 12.
Swm256 is highly data-parallel. Our base compiler is able to achieve good speedups by parallelizing the outermost parallel loop in all the frequently executed loop nests. The decomposition phase discovers that it can, in fact, parallelize both of the loops in the 2-deep loop nests in the program, without incurring any major data reorganization. The compiler chooses to exploit parallelism in both dimensions simultaneously, in an attempt to minimize the communication to computation ratio. Thus, the computation decomposition algorithm assigns two-dimensional blocks to each processor. However, the data accessed by each processor are scattered, causing poor cache performance. Fortunately, when we apply both the computation and data decomposition algorithm to the program, the program regains the performance lost and is slightly better than that obtained with the base compiler.
Figure 12: Swm256 Speedups