We have implemented our algorithm as part of the SUIF (Stanford University Intermediate Format) compiler system . SUIF is a flexible infrastructure for compiler research. It includes a full ANSI C front end, so we are able to evaluate our analysis with large, realistic application programs. The SUIF system also includes many components that can benefit from points-to information. We have only just begun to take advantage of this. To illustrate the potential, we report some preliminary results of using the points-to information to parallelize numeric C programs.
We present preliminary results for a number of benchmarks, including several from the SPEC benchmark suites. Two of the floating-point programs from SPECfp92, alvinn and ear, are written in C and we include them both here. Of the SPECint92 integer programs, we present results for compress and eqntott. Most of the other SPECint92 benchmarks contain setjmp/longjmp calls or asynchronous signal handlers. We eventually plan to support setjmp/longjmp calls in a conservative fashion, but we do not know of any practical way to analyze programs that use asynchronous signal handlers, except for simple cases where the signal handlers exit from the programs. We have also analyzed some Unix utilities, grep and diff, and a number of the benchmarks used by Landi and Ryder .
Since the focus of our work has been making pointer analysis efficient, we are primarily concerned with the times required to analyze the various benchmarks. Table 2 shows the benchmark programs sorted by size. The first two columns list the number of source lines and the number of procedures encountered during the analysis. The third column shows the analysis times in seconds for our algorithm running on a DECstation 5000/260. These times do not include the overhead for reading the procedures from the input files, building flow graphs, and computing dominance frontiers. Neither do they include the time required to write the results to the SUIF output files.
Our pointer analysis is clearly fast enough to be practical for these programs. In all cases, only a few seconds are required for the analysis. The amount of time can vary significantly, though, depending not only on the overall size of the input program but also on other characteristics of the program. For example, even though it is quite a bit larger than eqntott, ear is much easier to analyze. In general, floating-point applications seem to be easy targets for pointer analysis. More complex programs will require somewhat more analysis time.
Besides the overall analysis times, we also measured some statistics to evaluate our use of PTFs. The results are very encouraging. The last column of Table 2 shows the average number of PTFs per procedure. These averages are all close to one. Moreover, most of the situations where a procedure has more than one PTF are only due to differences in the offsets and strides in the initial points-to functions. Combining PTFs in those situations could improve the efficiency with only a small loss in context-sensitivity.
The compiler program is an interesting case to examine in more detail. This program is a small compiler that uses a recursive descent parser. It has many procedure calls and a lot of them are recursive. Together these factors cause the invocation graph described by Emami et al. to blow up to more than 700,000 nodes . This is for a program with only 37 procedures! For some larger applications, the exponential size of the invocation graph will be completely unreasonable to handle. Fortunately, our results show that it is unnecessary to reanalyze each invocation graph node.
Points-to information is useful for many different compiler passes, but it is crucial for parallelization. Our first use of pointer analysis is to show that static analysis can be used to parallelize loops in numerical C programs. The current SUIF parallelizer uses our points-to information to determine if formal parameters can be aliased. It then detects parallel loops and generates SPMD (Single Program Multiple Data) code for multiprocessors. It has many of the standard analyses for recognizing loop-level parallelism: constant propagation, induction variable recognition, and data dependence analysis. It also includes a few passes that are specific to parallelizing C programs: rewriting while loops as for loops where possible and rewriting pointer increments as array index calculations. After parallelization, the compiler generates a C output file which contains calls to our run-time library.
We ran our parallelizer over alvinn and ear. We instrumented our run-time system to measure the sequential execution time spent in the parallelized portions of the code. This measurement is shown as a percentage in the first column of Table 3. We also measured the sequential time spent in each invocation of each parallelized loop to determine the granularity of the parallelism. The averages of these times are shown in the second column of Table 3.
For both programs, the compiler managed to parallelize all the major loops. We ran the generated code on two and four processors of an SGI 4D/380. The speedups are shown in Table 3. The parallelized alvinn achieves very good speedups. The ear program performs reasonably well for two processors but it does not speed up much more with four processors. This is not surprising since there is so little computation in each parallelized loop and since the parallelized program suffers from a lot of false sharing. Here we have only shown the use of pointer analysis in a loop parallelizer for multiprocessors. The same pointer information could be used to generate parallel code for superscalar and VLIW processors, on which both alvinn and ear would perform very well.