Man page for pgen.1
Table of Contents

NAME

pgen - generate code for shared address space multiprocessors

SYNOPSIS

pgen [ options ] infile outfile

DESCRIPTION

The pgen program transforms SUIF code to run in parallel on shared address space multiprocessors. It restructures the code and inserts calls to the parallel runtime library. Pgen reads annotations (created by previous passes) from the SUIF code that tell it where and how to modify the code. These annotations fall into three categories: parallelization and scheduling, synchronization and variable scoping. If none of these annotations are found in the code, then the input file will be read and written back again without any modifications.

This pass expects two file specifications on the command line, first for the input file, then for the output file.

OPTIONS

-check-work
Generate code that checks if the amount of work in a parallel loop is above certain thresholds before running it.

-fortran-form
Use stub procedures to generate cleaner code for the task procedures. The runtime library calls the stub routine which in turns calls the cleaned-up version of the task procedure. The task procedure now includes the original task code and uses call-byreference parameters when possible to give additional information to later optimization passes. In particular, the code in the task procedures can be converted into Fortran. This option also generates calls to the Fortran compatible versions of the runtime library routines.

-l Print out the procedure and source line number of parallelized regions of code.

-no-array-reductions
Do not parallelize loops with reductions over array variables.

-p n Set the number of processors to the integer n. Setting the number of processors to 0 means the number of processors is unknown at compile-time, and will be determined at run-time. The default is 0.

-reduction-guards
Generate guard statements before calls to the reduction routines in the runtime library. The guard statements avoid calling the reduction routines if the current processor makes no contribution to the final value of the reduction. This happens when the local value of the reduction variable is the identity for the given reduction type (i.e. 0 for sum reductions and 1 for product reductions).

-sf2c
This flag must be set if the infile was generated by sf2c.

ANNOTATIONS READ

Parallelization and Scheduling:

begin_parallel_region
end_parallel_region
These annotations can be placed on any tree_node. All tree_nodes between (and including) the "begin_parallel_region" and "end_parallel_region" annotations will be run in parallel on different processors.

doall
This annotation can only be placed on TREE_FORs. The "doall" annotation is short-hand for a parallel region comprised of a single TREE_FOR. The iterations of the loop are statically scheduled across the processors in a blocked fashion. Any "doall" annotations on TREE_FORs nested inside within this TREE_FOR will be stripped off (i.e. pgen will not generate nested parallel regions). This annotation is created by skweel. Only one of the annotations "doall", "comp_decomp" and "loop_cyclic" is allowed within a parallel region.

comp_decomp named_symcoeff_ineq
This annotation can only be placed on TREE_FORs. Schedule iterations of the TREE_FOR across the processors. The named_symcoeff_ineq specifies the mapping from the iterations to processors. The dependence library is called to generate the new bounds of the TREE_FOR for the given inequality. Only one of the annotations "doall", "comp_decomp" and "loop_cyclic" is allowed within a parallel region.

loop_cyclic dimension offset
This annotation can only be placed on TREE_FORs. Schedule the loop using a cyclic mapping in the processor dimension given by the integer dimension. Currently, only dimensions of 0, 1 or 2 are accepted. The integer offset gives the starting offset of the loop. Only one of the annotations "doall", "comp_decomp" and "loop_cyclic" is allowed within a parallel region.

Synchronization:

guard var+
This annotation can only be placed on TREE_FORs. Put an IF statement around the TREE_FOR so that the loop is only executed if the var_syms var are equal to the processor that owns that data written within the loop. This annotation can only be used in conjunction with the "comp_decomp" annote.

doacross dimension direction type
This annotation can only be placed on TREE_FORs. Generate counter synchronization around this loop. The integer dimension specifies the processor dimension in which to place the synchronization. Currently, only dimensions of 0, 1 or 2 are accepted. The integer direction gives the offset of the processor to wait on. The string type gives the scheduling type of the loop. Currently, the only kind accepted is "block".

tile_loops depth tripsize+
This annotation can only be placed on TREE_FORs. Tile the loop nest (the outermost loop in the nest is the loop with the annotation). An integer tripsize must be given for each loop in the nest, and specifies the size of the tile for that loop. A trip of 1 in the outer loop will cause the tile to be coalesced. For a loop nest of depth n, standard tiling creates 2n loops. If the tile is coalesced, then 2n-1 loop are generated.

lock locknum
unlock locknum
This annotation can be placed on any tree_node. Generates lock or unlock statements after the tree_node with the annote. The integer locknum is the number of the lock variable.

global_barrier
This annotation can be placed on any tree_node. Generate a barrier statement after the tree_node with the annote. This generates a global barrier that makes all the processors wait at the barrier. Unique barriers are generated for each barrier annotation within a parallel region.

Variable Scoping:

reduced type var
This annotation can only be placed on TREE_FORs. Create a private copy of the var_sym var on each processor. After the loop, perform a global reduction of the kind given by the string type. Currently supported reduction types include sum, product, max and min. Multiple "reduced" annotations on a single TREE_FOR are allowed. This annotation is created by skweel.

privatized var+
This annotation can only be placed on TREE_FORs. Create a private copy of the var_syms var on each processor. Multiple "privatized" annotations on a single TREE_FOR are allowed. This annotation is created by skweel.

ANNOTATIONS WRITTEN

None.

SEE ALSO

skweel(1), scc(1)

NOTES

After running SUIF code through pgen, the resulting program must link in the runtime library for the target machine. In addition, FORTRAN programs must link in the F77_doall and I77_doall libraries. The F77_doall and I77_doall libraries replace the F77 and I77 libraries, respectively, used for sequential FORTRAN programs.

HISTORY

The pgen program was written by Jennifer Anderson.


Table of Contents