The decomposition problem is very complex, as there are many inter-related issues that must be addressed. This paper addresses the full problem of automatically calculating data and computation decompositions for programs in a systematic way.
Our algorithms are based on the mathematical model of decompositions as affine functions. This framework is general enough to handle a broad class of array access patterns. Using the affine model we structure decompositions into three components: partition, orientation and displacement. Since equivalent decompositions have the same partition, we solve for the partition first and can therefore evaluate many possible decomposition designs simultaneously.
To maximize parallelism, our algorithm exploits forall parallelism, as well as doacross parallelism using tiling. To minimize communication, the algorithm tries to find a static decomposition that exploits the maximum degree of parallelism available in the program such that there is no reorganization nor pipeline communication. The algorithm will trade off extra degrees of parallelism to eliminate communication. If communication is needed, the algorithm will try to reduce expensive reorganization communication to inexpensive pipelined communication by tiling. Finally, any necessary data reorganization communication is inserted into the least frequently executed parts of the program.