Erlebacher is a 600-line FORTRAN benchmark from ICASE that performs three-dimensional tridiagonal solves. It includes a number of fully parallel computations, interleaved with multi-dimensional reductions and computational wavefronts in all three dimensions caused by forward and backward substitutions. Partial derivatives are computed in all three dimensions with three-dimensional arrays. Figure 11 shows the resulting speedups for each version of Erlebacher.

**Figure 11:** Erlebacher Speedups

The base-line version always parallelizes the outermost parallel loop.
This strategy yields local accesses in the first two phases of
Erlebacher when computing partial derivatives in the **X** and **Y**
dimensions, but ends up causing non-local accesses in the **Z** dimension.

The computation decomposition algorithm improves the performance of
Erlebacher slightly over the base-line version. It finds a
computation decomposition so that no non-local accesses are needed in
the **Z** dimension. The major data structures in the program are the
input array and ` DUX`, ` DUY` and ` DUZ` which are used to
store the partial derivatives in the **X**, **Y** and **Z** dimensions,
respectively. Since it is only written once, the input array is
replicated. Each processor accesses a block of columns for arrays
` DUX` and ` DUY`, and a block of rows for array ` DUZ`.
Thus in this version of the program, ` DUZ` has poor spatial
locality. The data transformation phase of the compiler restructures
` DUZ` so that local references are contiguous in memory. Because
two-thirds of the program is perfectly parallel with all local
accesses, the optimizations only realize a modest performance
improvement.

Fri Apr 7 11:22:17 PDT 1995