Go to the previous, next section.

Compiling Parallel Programs

To invoke the parallelization phases of the SUIF compiler, -parallel flag of the compiler driver `scc'. The passes that `scc' runs for parallelization are as follows:

Constant Propagation (`porky -const-prop').
Propagate the values of constants.

Scalarize Array Accesses (`porky -scalarize').
Turn local array variables into collections of element variables when all uses of the array are loads or stores of known elements.

Forward Propagation (`porky -forward-prop').
Propagate the calculation of local variables into the bound and step expressions of TREE_FORs and the index expressions of array reference instructions when possible.

Normalization (`predep -normalize').
Normalize all TREE_FORs by modifying all array references and loop definitions so that the step is always 1 and the test is always less-than-or-equal.

Induction Variable Detection (`porky -ivar', `porky -know-bounds').
Recognize auxiliary induction variables and replace the uses within loops by expressions of the loop index. The closed form for the value is assigned to the auxiliary induction variables outside the loop.

Constant Folding (`porky -fold').
Fold constants in expressions.

Scalar Privatization Analysis (`moo -Psce').
Find privatizable scalar variables within each loop. A variable is privatizable if every iteration of the loop can have its own copy of the variable.

Reduction Recognition (`reduction').
Recognize reductions in loops. A reduction is an associative computation accumulated into one location. Such loops can then be parallelized with simple synchronization.

Dependence Preprocessing (`predep -presc')
Preprocess for dependence analysis of simple non-linear array access functions. Since the SUIF dependence analyzer is a linear affine expression based analyzer, it cannot handle any non-linear array access functions. This pass preprocesses the code for the case where symbolic coefficients are used in array access functions.

Parallelism and Locality Optimizer (`skweel -T')
Analyze and transform the code to optimize loop-level parallelism and locality. This pass performs data dependence analysis using the `dependence' library. It uses unimodular loop transformations (such as loop interchange, reversal and skewing) to expose coarse-grain parallelism in the code.

Parallel Code Generator (`pgen').
Generate parallel code for shared address space multiprocessors. This pass restructures the SUIF code and inserts calls to the parallel run-time library.

After running the parallelization passes, `scc' can generate C code or MIPS code. The current default is for `scc' to run the MIPS code generator. To run the SUIF-to-C converter, use the -s2c flag with `scc'. For example, to run the parallelizer on a FORTRAN source file and then generate a C output file, use the command:

        scc -parallel -s2c -.out.c myprog.f

This tells `scc' to run the parallelization passes, followed by `s2c'. The resulting C program is in the file myprog.out.c.

Any programs that are run through the parallel code generator pgen must link in the run-time library for the target machine. The run-time libraries for SGI and DASH machines are `runtime_sgi' and `runtime_dash', respectively. The uniprocessor library is called `runtime_seq'. In addition, FORTRAN programs must link in the `F77_doall' library. The `F77_doall' library replaces the `F77' library used for sequential FORTRAN programs. These libraries are linked in automatically by `scc'. For example, to generate a parallel executable on an SGI machine starting from the original FORTRAN source use the command:

        scc -parallel -o myprog myprog.f

Or, to compile the C file myprog.out.c generated by `s2c' above:

        cc -o myprog myprog.out.c \
            $(SUIFHOME)/$(MACHINE)/lib/libruntime_sgi.a \
            $(SUIFHOME)/$(MACHINE)/lib/libF77_doall.a \
            $(SUIFHOME)/$(MACHINE)/lib/libI77.a -lm -lmpc

where -lm and -lmpc are the math library and SGI multiprocessing library, respectively.

Go to the previous, next section.