Go to the previous, next section.

Performance Results

We ran the SUIF parallelizer on the Perfect, NAS and SPEC92 benchmark suites. We first generated parallelized C code using the scc -parallel -s2c -.out.c command, and compiled the resulting program using the native compiler on the target architecture.

The parallelized programs were run on DASH and/or SGI targets. All of the parallelized Perfect benchmarks return the same results as our uniprocessor compiler. CSS.f contains non-standard Fortran code with writes to aliased procedure parameters (these loops are marked non-concurrent with compiler directives). However, since SUIF doesn't read directives, we had to modify the code by hand to avoid parallelizing these loops. Also, SMS.f must be compiled with the -no PREDEP flag. All of the NAS and SPEC92 benchmarks validate.

To evaluate the effectiveness of the SUIF parallelizer, we compared it with the KAP compiler, a commercial parallelizing compiler from Kuck and Associates, Incorporated. The KAP compiler has a number of analyses and optimizations that the current version of SUIF does not perform. Also, the SUIF parallelizer is designed to exploit maximum parallelism; it parallelizes all loops where legal without taking parallelism overhead into account. We ran the KAP compiler (version 3.10.1) using pfa -mc=1 -o=4 -roundoff=2 -so=1. These flags were chosen so that the functionality of KAP would be as close to SUIF as possible. Setting the -mc=1 encourages KAP to parallelize all loops where legal. The -o=4 flag turns on advanced data dependence but stops short of enabling loop fusion. The -roundoff=2 flag allows reduction recognition and -so=1 is basic scalar optimizations.

We ran both the SUIF and KAP compilers on our benchmark suites and looked at the number of parallel loops found by both compilers. The results are presented below:



  Perfect            Parallelized Loops       KAP loops
  Club           KAP  SUIF     By  KAP  SUIF    found 
                total total   Both only only   by SUIF 
------------------------------------------------------
  ADM    (APS)   141   134    132    9    2      94 %
  SPICE  (CSS)     8    30      8    0   22     100 %
  QCD    (LGS)    85    85     84    1    1      99 %
  MDG    (LWS)    29    26     24    5    2      83 %
  TRACK  (MTS)    48    48     48    0    0     100 % 
  BDNA   (NAS)   103   105     94    9   11      95 %
  OCEAN  (OCS)    65    60     59    6    1      91 %
  DYFESM (SDS)    90    93     86    4    7     100 %
  MG3D   (SMS)    38    33     33    5    0      87 %
  ARC2D  (SRS)   136   136    132    4    4     100 %
  FLO52  (TFS)    64    64     64    0    0     100 %
  TRFD   (TIS)    22    25     22    0    3     100 %
  SPEC77 (WSS)   202   195    193    9    2      96 %
------------------------------------------------------
  Total         1031  1034    979   52   55      96 %
  Average       79.3  79.5   75.3  4.0  4.2      96 %


  NAS                Parallelized Loops       KAP loops
  Benchmarks     KAP  SUIF     By  KAP  SUIF    found
                total total   Both only only   by SUIF
------------------------------------------------------
  appbt           92    90     89    3    1      97 %
  applu           67    72     62    5   10      93 %
  appsp           80    87     74    6   13      93 %
  buk              2     4      2    0    2     100 %
  cgm             14    16     14    0    2     100 %
  embar            1     5      1    0    4     100 %
  fftpde          13    14     10    3    4      77 %
  mgrid           12    15     11    1    4      92 %
------------------------------------------------------
  Total          281   303    263   18   41      94 %
  Average       35.1  37.9   32.9  2.2  5.1      94 %


  SPEC 92           Parallelized Loops        KAP loops
  Benchmarks     KAP  SUIF     By  KAP  SUIF    found
                total total   Both only only   by SUIF
------------------------------------------------------
   doduc         218   212    207   11    5      95 %
   fpppp          11    11     10    1    1      91 %
 hydro2d          92    85     85    7    0      92 %
 mdljdp2          18    17     15    3    2      83 %
 mdljsp2          17    16     14    3    2      82 %
   nasa7          47    40     39    8    1      83 %
     ora           3     3      2    1    1      67 %
  su2cor          60    57     56    4    1      93 %
  swm256          16    16     16    0    0     100 %
 tomcatv          10     9      9    1    0      90 %
   wave5         207   196    190   17    6      92 %
------------------------------------------------------
  Total          699   662    643   56   19      92 %
  Average       63.6  60.2   58.4  5.1  1.7      88 %

The first two columns (KAP total and SUIF total) in these tables is the number of parallel loops found by the two compilers. The By Both column gives the number loops common to both compilers. The KAP only and SUIF only columns contain the number of loops that were only found by the KAP and SUIF compiler, respectively. Finally, the KAP loops found by SUIF column gives the percentage of loops parallelized by KAP that are either parallelized by SUIF directly or are enclosed in loops parallelized by SUIF.

As the results show, the SUIF parallelizer compares quite favorably with KAP. It finds about 92 to 96 percent of the loops parallelized by KAP, and can parallelize a fair number of loops KAP misses.

Go to the previous, next section.