Affine Transformations

for Optimizing Parallelism and Locality

An Affine Partitioning Theory

It is well known that many loop transformations can be used to improve parallelization as well as the memory subsystem performance for both uniprocessor and multiprocessor systems. Many transformations have been proposed in the past including unimodular transformations (interchange, skew and reversal), fusion, fission, reindexing, scaling, and statement reordering. We have developed a new transformation framework called affine partitioning that unifies all the above transformations.

Affine Partitioning Based Algorithms

Experimental Results

The affine partitioning algorithm has been prototyped in the SUIF2 Compiler Infrastructure. Data locality optimizations have shown to triple uniprocessor performance on some programs. A combination of affine transformations to improve parallelization and locality has allowed a speed up of 20 times on a 32-processor, which represents over a three-fold improvement over the previously best results.