next up previous
Next: Introduction

Global Optimizations for Parallelism and Locality on Scalable Parallel Machines

Jennifer M. Anderson and Monica S. Lam
Computer Systems Laboratory
Stanford University, CA 94305

This research was supported in part by DARPA contract N00039-91-C-0138, an NSF Young Investigator Award and a fellowship from Digital Equipment Corporation's Western Research Lab.

In Proceedings of SIGPLAN '93 Conference on Programming Language Design and Implementation (PLDI) Albuquerque, New Mexico, June 23--25, 1993


Data locality is critical to achieving high performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus the mapping, or decomposition, of the computation and data onto the processors of a scalable parallel machine is a key issue in compiling programs for these architectures.

This paper describes a compiler algorithm that automatically finds computation and data decompositions that optimize both parallelism and locality. This algorithm is designed for use with both distributed and shared address space machines. The scope of our algorithm is dense matrix computations where the array accesses are affine functions of the loop indices. Our algorithm can handle programs with general nestings of parallel and sequential loops.

We present a mathematical framework that enables us to systematically derive the decompositions. Our algorithm can exploit parallelism in both fully parallelizable loops as well as loops that require explicit synchronization. The algorithm will trade off extra degrees of parallelism to eliminate communication. If communication is needed, the algorithm will try to introduce the least expensive forms of communication into those parts of the program that are least frequently executed.

Jennifer-Ann M. Anderson
Fri Apr 7 14:39:58 PDT 1995