To find partitions that maximize parallelism and have neither pipelined nor data reorganization communication, we find the minimum partitions that satisfy constraints -- . Constraint 1 (single loop) is used to initialize the computation partitions and constraint 2 (multiple arrays) is used to initialize the data partitions. An iterative algorithm is used to satisfy constraint 3 (data-computation relationship). An overview of this algorithm is shown in Figure 2.
Figure 2: Algorithm for calculating partitions.
The iterative algorithm calculates the effects of the loop nests on the arrays using Eqn. 5 and of the arrays on the loop nests using Eqn. 6. This continues until a stable partition is found. Informally, the partition algorithm trades off extra degrees of parallelism to eliminate communication. Going back to the simple example in Figure 1, the array index functions for arrays X and Z are . The index functions for array Y in the first and second loop nests are and , respectively. is initialized to and is initialized to . The data partitions are initialized to . The routine Update_Arrays is called with loop nest 2. Eqn. 5 is applied to arrays Y and Z, resulting in and . Next, routine Update_Loops is called with arrays Y and Z. Because of array Y, Eqn. 6 is applied to loop nest 1, resulting in . Finally, Update_Arrays is called again with loop nest 1 and .
Proof:The partition algorithm satisfies constraints 1 and 2 because the data and computation partitions are initialized with these constraints. The algorithm finds the minimum partition that satisfies the data-computation relation constraint 3 because the algorithm only ever increases the partitions in order to ensure that the constraint is satisfied. To prove termination, we use the fact that the spaces (data partitions) and (computation partitions) increase in size monotonically as the algorithm progresses. In the worst case, the partitions will span the entire space and the algorithm will terminate.
After a data partition has been found for each array and a computation partition for each loop nest, the next step is to determine the number of virtual processor dimensions. The number of virtual processor dimensions n is
Here is the total array space accessed, typically the entire array. This equation will yield a value of n such that all the parallelism found in the partition algorithm is exploited. In the example from Figure 1, n = 1.