This paper has described the analyses in a fully interprocedural automatic parallelization system. This system has been used in an extensive experiment that has demonstrated that interprocedural data-flow analysis, array privatization and reduction recognition are key technologies that greatly improve a parallelizing compiler's ability to locate coarse-grain parallel loops. Through our work, we discovered that the effectiveness of an interprocedural parallelization system depends on the strength of all the individual analyses, and their ability to work together in an integrated fashion. This comprehensive approach to parallelization analysis is why our system has been much more effective at automatic parallelization than previous interprocedural systems and commercially available compilers.
For some programs, our analysis is sufficient to find the available parallelism. For other programs, it seems impossible or unlikely that a purely static analysis could discover parallelism---either because correct parallelization requires dynamic information not available at compile time or because it is too difficult to analyze. In such cases, we might benefit from some support for run-time parallelization or user interaction. The aggressive static parallelizer we have built will provide a good starting point to investigate these techniques.