Go to the previous section.
In the SUIF vs. KAP tables (see section Performance Results), a number of the KAP only loops are due to SUIF parallelizing an outer loop while KAP is parallelizing an inner loop. However, there are a number of limitations in the current SUIF compiler that cause it to miss some loops that KAP is able to parallelize.
DO 30 K=1,NZ
UM(K)=REAL(WM(K))
VM(K)=AIMAG(WM(K))
WRITE(6,40) K,ZET(K),UG(K),VG(K),TM(K),DKM(K),UM(K),VM(K)
WRITE(8,40) K,ZET(K),UG(K),VG(K),TM(K),DKM(K),UM(K),VM(K)
30 CONTINUE
However, KAP is able to parallelize some of the statements in the loop by applying loop distribution:
DO 2 K=1,NZ
UM(K)=REAL(WM(K))
VM(K)=AIMAG(WM(K))
2 CONTINUE
DO 3 K=1,NZ
WRITE(6,40) K,ZET(K),UG(K),VG(K),TM(K),DKM(K),UM(K),VM(K)
WRITE(8,40) K,ZET(K),UG(K),VG(K),TM(K),DKM(K),UM(K),VM(K)
3 CONTINUE
The DO 2 K loop can now be parallelized, and the DO 3 K loop
runs sequentially.
Each of the remaining SUIF limitations listed below have a lesser impact than loop distribution and equivalences. They each account for only a small number of the KAP only loops.
NS2 = (N+1)/2
NP2 = N+2
...
DO 102 K=2,NS2
KC = NP2-K
XH(K) = W(K-1)*X(KC)+W(KC-1)*X(K)
XH(KC) = W(K-1)*X(K)-W(KC-1)*X(KC)
102 CONTINUE
SUIF replaces NS2 with (N+1)/2 and KC with
N+2-K, leaving the following code:
DO 102 K=2,(N+1)/2
KC = NP2-K
XH(K) = W(K-1)*X(N+2-K)+W(N-K+1)*X(K)
XH(N+2-K) = W(K-1)*X(K)-W(N-K+1)*X(N+2-K)
102 CONTINUE
However, since SUIF cannot determine whether N+1 is even, the
function (N+1)/2 is non-linear. The dependence library
then does not use the bounds to determine that the accesses to
XH(K) and X(N+2-K) are independent.
DO 30 K=1,NZTOP
DO 20 J=1,NY
DO 10 I=1,NX
L=L+1
DCDX(L)=-(UX(L)+UM(K))*DCDX(L)-(VY(L)+VM(K))*DCDY(L)+Q(L)
10 CONTINUE
20 CONTINUE
30 CONTINUE
The auxiliary induction variable L is replaced with a function of
the loop indices, and the resulting accesses to the array DCDX
become DCDX(NY * NX * (K-1) + NX * (J-1) + I - 1). Because
of the multiple symbolic coefficients NY and NX, the
dependence library is not able to analyze the expression.
14 KBOT = KK - 1
KTOP = KK
... code containing IF statements deleted ...
DO 18 K=KK,KTOP
18 Q(K) = Q(KBOT)
KAP is able replace KBOT with KK-1 and can thus determine
that the two accesses to array Q are independent.
porky
-scalarize pass will only turn array elements into scalars if they are
accessed by constant indices throughout the entire program.
EXP (on complex values) gets translated to a procedure
with two arguments in the `F77' library.
DO 30 I=2,N2P
30 WORK(1) = MAX(WORK(1), WORK(I))
Elements WORK(2) through WORK(N2P) are reduced into
WORK(1). Currently, SUIF will not find reductions of arrays into
a subsection of the array.
porky -scalarize pass. However, the reduction
recognition pass cannot handle the indirect reductions through the
temporaries.
DO 40 I=0,249
IREG(I)=IREG(I+LVEC)
40 CONTINUE
This loop is sequential only when LVEC <= 249, and
thus the second loop below can run in parallel:
IF (LVEC .LE. 249) THEN
C Sequential
DO 40 I=0,249
IREG(I)=IREG(I+LVEC)
40 CONTINUE
ELSE
C Parallel
DO 2 I=0,249
IREG(I) = IREG(I+LVEC)
2 CONTINUE
END IF
Go to the previous section.