next up previous
Next: Discussion Up: Effectiveness of Advanced Previous: Effectiveness of Advanced

Metrics and Results

While parallel speedups measure the overall effectiveness of a parallel system, they are also highly machine dependent. Not only do speedups depend on the number of processors, they are sensitive to many aspects of the architecture, such as the cost of synchronization, the interconnect bandwidth and the memory subsystem. Furthermore, speedups measure the effectiveness of the entire compiler system and not just the parallelization analysis, which is the focus of the paper. For example, techniques to improve data locality and minimize synchronization can greatly improve the speedups obtained. Thus, to more precisely capture how well the parallelization analysis performs, we use the two following metrics:

Parallelism Coverage. Coverage, as introduced in Section 6.3.2, is an important metric for measuring the effectiveness of parallelization analysis. By Amdahl's law, programs with low coverage will not get good parallel speedup. For example, even for a program with 80% coverage, its ideal speedup is only 2.5 on 4 processors. High coverage is indicative that the compiler analysis is locating significant amounts of parallelism in the computation.

Granularity of Parallelism. A program with high coverage is not guaranteed to achieve parallel speedup due to a number of factors. The granularity of parallelism extracted is a particularly important factor, as frequent synchronizations can slow down, rather than speed up, a fine-grain parallel computation. To quantify this property, we define a program's granularity as the average execution time of its parallel regions.

Figures 5(B) and (C) show a comparison of the parallelism coverage and granularity achieved by the SUIF and the baseline compiler.

For the sake of completeness, we also present a set of speedup measurements. The programs in the benchmark suite have relatively short execution times as well as fine granularities of parallelism, as shown in Figure 5(C). Most of these programs cannot utilize a large number of processors effectively. For our experiment, we run all the programs on a 4-processor 200MHz SGI Challenge. Speedups are calculated as ratios between the execution time of the original sequential program and the parallel execution time. The results are shown in Figure 5(D).



next up previous
Next: Discussion Up: Effectiveness of Advanced Previous: Effectiveness of Advanced



Saman Amarasinghe
Fri Sep 15 09:15:06 PDT 1995