Go to the previous section.

SUIF Limitations

In the SUIF vs. KAP tables (see section Performance Results), a number of the KAP only loops are due to SUIF parallelizing an outer loop while KAP is parallelizing an inner loop. However, there are a number of limitations in the current SUIF compiler that cause it to miss some loops that KAP is able to parallelize.

Loop Distribution

SUIF does not perform loop distribution, whereas the KAP compiler does. This accounts for more of the KAP only loops than any of the other limitations listed below. For example, the following loop from the Perfect club benchmark APS.f performs I/O in the loop and thus is not parallelizable:

      DO 30 K=1,NZ
          WRITE(6,40) K,ZET(K),UG(K),VG(K),TM(K),DKM(K),UM(K),VM(K)
          WRITE(8,40) K,ZET(K),UG(K),VG(K),TM(K),DKM(K),UM(K),VM(K)

However, KAP is able to parallelize some of the statements in the loop by applying loop distribution:

      DO 2 K=1,NZ

      DO 3 K=1,NZ
          WRITE(6,40) K,ZET(K),UG(K),VG(K),TM(K),DKM(K),UM(K),VM(K)
          WRITE(8,40) K,ZET(K),UG(K),VG(K),TM(K),DKM(K),UM(K),VM(K)

The DO 2 K loop can now be parallelized, and the DO 3 K loop runs sequentially.

FORTRAN Equivalences

Another reason KAP is able to parallelize loops that SUIF cannot is due to the way SUIF handles FORTRAN equivalences. SUIF carries no information about which variables could potentially refer to the same memory location. Accesses to equivalenced variables are represented as offsets from a base memory location. The parallelizer, upon seeing these offsets, assumes that any location in memory could potentially be touched, and thus cannot parallelize the loop. This problem shows up in the SPEC92 benchmark wave5.f, for example. SUIF's handling of equivalences is an artifact of our FORTRAN front end being based on the FORTRAN-to-C converter `f2c'.

Each of the remaining SUIF limitations listed below have a lesser impact than loop distribution and equivalences. They each account for only a small number of the KAP only loops.

Normalization with Symbolic Loop Steps

SUIF currently does not normalize loops with symbolic step sizes. If a loop is not normalized, the parallelizer will not consider parallelizing the loop.

Non-Linear Loop Bounds

SUIF's dependence analysis is based on linear affine functions. It will not take the loop bounds into account if they are non-linear. For example, in the following loop from the SPEC92 benchmark wave5.f:

      NS2 = (N+1)/2
      NP2 = N+2
      DO 102 K=2,NS2
         KC = NP2-K
         XH(K) = W(K-1)*X(KC)+W(KC-1)*X(K)
         XH(KC) = W(K-1)*X(K)-W(KC-1)*X(KC)

SUIF replaces NS2 with (N+1)/2 and KC with N+2-K, leaving the following code:

      DO 102 K=2,(N+1)/2
         KC = NP2-K
         XH(K) = W(K-1)*X(N+2-K)+W(N-K+1)*X(K)
         XH(N+2-K) = W(K-1)*X(K)-W(N-K+1)*X(N+2-K)

However, since SUIF cannot determine whether N+1 is even, the function (N+1)/2 is non-linear. The dependence library then does not use the bounds to determine that the accesses to XH(K) and X(N+2-K) are independent.

Multiple Symbolic Coefficients

We have extended SUIF's dependence analysis to handle simple non-linear array index functions. However, this analysis will not work when there are multiple symbolic coefficients. For example in the following loop from the Perfect club benchmark APS.f:

     DO 30 K=1,NZTOP
       DO 20 J=1,NY
         DO 10 I=1,NX
 10      CONTINUE

The auxiliary induction variable L is replaced with a function of the loop indices, and the resulting accesses to the array DCDX become DCDX(NY * NX * (K-1) + NX * (J-1) + I - 1). Because of the multiple symbolic coefficients NY and NX, the dependence library is not able to analyze the expression.

Symbolic Analysis

SUIF's scalar analysis uses only localized information, and does not perform symbolic analysis. For example, symbolic analysis is needed for the following loop from the Perfect benchmark WSS.f:

 14  KBOT = KK - 1
     KTOP = KK

     ... code containing IF statements deleted ...

     DO 18 K=KK,KTOP
 18  Q(K) = Q(KBOT)

KAP is able replace KBOT with KK-1 and can thus determine that the two accesses to array Q are independent.

Scalarized Array Accesses

KAP is able to copy references to array elements into scalars if they are only accessed by constant indices within a loop. It is then able to privatize the scalars, and parallelize the loop. SUIF's porky -scalarize pass will only turn array elements into scalars if they are accessed by constant indices throughout the entire program.

FORTRAN Intrinsics

SUIF's FORTRAN front-end, `sf2c', converts some FORTRAN intrinsics into procedures that are not pure functions. Thus any loops containing calls to these procedures are not parallelized. For example, the intrinsic EXP (on complex values) gets translated to a procedure with two arguments in the `F77' library.

Imperfect Reductions

KAP is able to handle some forms of reductions that SUIF cannot yet handle. For example, consider the following loop from the Perfect club benchmark OCS.f:

      DO 30 I=2,N2P
 30   WORK(1) = MAX(WORK(1), WORK(I))

Elements WORK(2) through WORK(N2P) are reduced into WORK(1). Currently, SUIF will not find reductions of arrays into a subsection of the array.

Reductions over Complex Variables

SUIF is not be able to recognize some reductions over complex variables. This is due to the fact that SUIF represents complex variables as arrays of size 2 -- one element for the complex part and one for the real part. Also, the front-end often generates extra temporaries for these complex variables. The array references to complex variables are turned into scalars by the porky -scalarize pass. However, the reduction recognition pass cannot handle the indirect reductions through the temporaries.

Conditional Parallelization

Finally, KAP performs conditional parallelization, whereas SUIF does not. KAP will duplicate a loop to create a parallel version and a sequential version. For example, consider the following loop from the SPEC92 benchmark su2cor.f:

      DO 40 I=0,249

This loop is sequential only when LVEC <= 249, and thus the second loop below can run in parallel:

      IF (LVEC .LE. 249) THEN

C Sequential
      DO 40 I=0,249


C Parallel
      DO 2 I=0,249
          IREG(I) = IREG(I+LVEC)
      END IF

Go to the previous section.