Affine Transformations
for Optimizing Parallelism and Locality
An Affine Partitioning Theory
It is well known that many loop transformations can be used to improve
parallelization as well as the memory subsystem performance for both
uniprocessor and multiprocessor systems. Many transformations have
been proposed in the past including unimodular transformations
(interchange, skew and reversal), fusion, fission, reindexing,
scaling, and statement reordering. We have developed a new
transformation framework called affine partitioning that unifies all
the above transformations.
Affine Partitioning Based Algorithms
 Developed an algorithm that finds the optimal affine partitioning that
maximizes the degree of parallelism while minimizing the degree of
synchronization.
 Developed an algorithm that maximizes parallelism and minimizes
communication between processors.
 Developed an algorithm that uses affine partitioning to improve
the data locality of uniprocessor and multiprocessor programs.
 Introduced the concept of distributed blocking, a generalization
of blocking from perfect to arbitrary loop nestings.
 Developed an
algorithm that integrates affine partitioning with blocking.
 Developed an algorithm that uses affine partitioning to maximize
the opportunity of array contraction.
Experimental Results
The affine partitioning algorithm has been prototyped in the SUIF2
Compiler Infrastructure. Data locality optimizations have
shown to triple uniprocessor performance on some programs. A
combination of affine transformations to improve parallelization and
locality has allowed a speed up of 20 times on a 32processor, which
represents over a threefold improvement over the previously best
results.
Publications

Improving Parallelism And Data Locality With Affine Partitioning
A. W. Lim
Ph.D. thesis, Stanford University, Computer Science Department, September
2001.
(Also available in gzip'ed postscript format.)

Blocking and Array Contraction Across Arbitrarily Nested Loops Using
Affine Partitioning
A. W. Lim, S.W. Liao and M. S. Lam
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice
of Parallel Programming, June, 2001.

Cache Optimizations With Affine Partitioning
A. W. Lim and M. S. Lam.
Proceedings of the Tenth SIAM Conference on Parallel Processing
for Scientific Computing, Portsmouth, Virginia, March, 2001.
 An Affine Partitioning Algorithm to Maximize
Parallelism and Minimize Communication
A. W. Lim, G. I. Cheong and M. S. Lam
Proceedings of the 13th ACM SIGARCH International Conference on
Supercomputing, June, 1999.
 Maximizing Parallelism and Minimizing Synchronization
with Affine Partitions
A. W. Lim and M. S. Lam
Parallel Computing, Vol. 24, Issue 34, May 1998, Pages 445475. (Also
available in PDF format.)
 Maximizing Parallelism and Minimizing
Synchronization with Affine Transforms
A. W. Lim and M. S. Lam
Conference Record of the 24th Annual ACM SIGPLANSIGACT Symposium on
Principles of Programming Languages, January, 1997.
 Finding Affine Partitions that Maximize Parallelism and Minimize
Synchronization
A. W. Lim and M. S. Lam
Proceedings of the Sixth Workshop on Compilers for Parallel Computers, Aachen,
Germany, December, 1996.
 CommunicationFree Parallelization via Affine
Transformations
A. W. Lim and M. S. Lam
Proceedings of the 7th Workshop on Languages and Compilers for
Parallel Computing, August, 1994.