To find partitions that maximize parallelism and
have neither pipelined nor data reorganization communication, we find
the minimum partitions that satisfy constraints
--
.
Constraint 1 (single loop) is used to initialize the computation partitions
and constraint 2 (multiple arrays) is used to initialize the data partitions.
An iterative algorithm is used to satisfy constraint 3 (data-computation
relationship).
An overview of this algorithm is shown in Figure 2.
Figure 2: Algorithm for calculating partitions.
The iterative algorithm calculates the effects of the loop nests on the arrays
using Eqn. 5 and
of the arrays on the loop nests using Eqn. 6.
This continues until a stable partition is found.
Informally, the partition algorithm trades off extra degrees of parallelism
to eliminate communication.
Going back to the simple example in Figure 1,
the array index functions for arrays X and Z
are
.
The index functions for array Y in the first and second loop nests
are
and
,
respectively.
is initialized to
and
is initialized
to
.
The data partitions
are initialized to
.
The routine Update_Arrays is called with loop nest 2.
Eqn. 5 is applied to arrays Y and Z, resulting
in
and
.
Next, routine Update_Loops is called with arrays Y and Z.
Because of array Y, Eqn. 6 is applied to loop nest 1,
resulting in
.
Finally, Update_Arrays is called again with loop nest 1 and
.
Proof:The partition algorithm
satisfies constraints 1 and 2 because the data and computation
partitions are initialized with these constraints.
The algorithm finds the minimum partition that satisfies
the data-computation relation constraint 3 because the algorithm only
ever increases
the partitions in order to ensure that the constraint is satisfied.
To prove termination, we use the fact that
the spaces
(data partitions)
and
(computation partitions)
increase in size monotonically as the algorithm progresses.
In the worst case, the partitions will span the entire space
and the algorithm will terminate.
After a data partition has been found for each array and a computation partition for each loop nest, the next step is to determine the number of virtual processor dimensions. The number of virtual processor dimensions n is

Here
is the total array space accessed, typically the entire array.
This equation will yield a value of n such that all the
parallelism found in the partition algorithm is exploited.
In the example from Figure 1, n = 1.