We also measure the dynamic impact of each of the advanced array analyses. The contribution of each analysis component is measured by recording the specific array analyses that apply to each parallelized loop, and instrumenting the sequential code to determine the execution time of each of the loops. We present the execution times as percentages of the total computation times in Figure 5(C). The measurements were taken by running the programs on a single processor in a 200Mhz SGI Challenge; as the results are reported in relative terms, they are applicable to a large class of processors. Note that even when interprocedural analysis is used to parallelize say, 100% of the computation, it does not mean that a non-interprocedural parallelizer will find no parallelism at all, as it may parallelize an inner loop.
We term the overall percentage of time spent in parallelized regions as the parallelism coverage. Overall, we observe rather good coverage (above 80%) for 8 of the 10 programs in SPEC92FP, 7 of the 8 NAS programs and 6 of the 12 PERFECT benchmarks. A third of the programs spend more than 50% of their execution time in loops requiring advanced array analysis techniques.
This graph also demonstrates how important parallelizing a single loop requiring one of the advanced analysis techniques can be. For example, the program mdljdp2 contains just two loops requiring interprocedural reduction, but those two loops are where the program spends 78% of its time.
Not only do some of these SUIF-parallelized loops execute for a long time, they can also be very large. The largest loop SUIF parallelizes is from spec77, consisting of 1002 lines of code from the original loop and its invoked procedures. The loop contains 60 subroutine calls to 13 different procedures. Within this loop, there are 48 interprocedural privatizable arrays, 5 interprocedural reduction arrays and 27 other arrays accessed independently. Such a loop illustrates the advantage of interprocedural analysis over inlining for parallelizing large programs. If instead this loop had been fully inlined, it would have contained nearly 11,000 lines of code.
Figure 5: Dynamic Measurements of SUIF and the Baseline Compiler