SUIF is a fully functional compiler that takes both Fortran and C as input languages. (For this experiment, we consider Fortran programs only.) The parallelized code is output as an SPMD (Single Program Multiple Data) parallel C version of the program that can be compiled by native C compilers on a variety of architectures. The resulting C program is linked to a parallel run-time system that currently runs on several bus-based shared memory architectures (SGI Challenge and Power Challenge, and Digital 8400 multiprocessors) and scalable shared-memory architectures (Stanford DASH and Kendall Square KSR-1).
There are two major components to automatic parallelization in SUIF. First, the analysis component locates the available parallelism in the code. This component encompasses all the interprocedural parallelization analyses presented in this paper. (In addition, SUIF includes C pointer analysis to support parallelization of C programs, but this is outside the scope of this paper.) The second major component is parallel code optimization and generation. Specifically, the full SUIF system incorporates data and loop transformations to increase the granularity of parallelism and to improve the memory behavior of the programs [1,27] and optimizations to eliminate unnecessary synchronization .
In this paper, however, we adopt a very simple parallel code generation strategy that does not include these optimizations in order to focus on the effects of the parallelization analysis. The compiler parallelizes only the outermost loop that the analysis has proven to be parallelizable. Our compiler suppresses parallelization of array reductions if the overheads involved are expected to overwhelm the benefits. In addition, the run-time system estimates the amount of computation in each parallelizable loop using the knowledge of the iteration count at run time, and runs the loop sequentially if it is considered too fine-grained to have any parallelism benefit. The iterations of a parallel loop are evenly divided between the processors at the time the parallel loop is spawned.