We can now formally state the dynamic decomposition problem. For a given communication graph we want to find the data decomposition of each array at each loop node, and the corresponding computation decomposition of the loops.
The computation decomposition determines the degree of parallelism in a loop nest. For each loop node, we use the computation decomposition and the loop node weight to estimate the benefit to the execution time as a result of the parallelism in the loop nest. Note that if there is tiling, the parallelism benefit of the loop takes into account the cost of pipeline communication within the loop. Data reorganization can occur if the decomposition for an array in one loop differs from the decomposition of the same array in another loop. Thus the data decompositions, together with the probabilities on the edges, are used to estimate the communication time. The value of the graph is the sum of the parallelism benefits of all the loop nodes minus the total communication cost. The goal is to label the arrays and loops with decompositions such that the overall value of the graph is maximized. For example, consider the following program fragment.
Figure 5: (a) A communication graph. (b) The components resulting from the dynamic decomposition. (c) The final decompositions.
Figure 5(a) shows the communication graph, assuming the expression is true of the time, and that both arrays are of size . The edges are labeled with the estimated communication time assuming that none of the decompositions match, and all the data must be reorganized between each loop nest. The value on the edge between nodes 1 and 4 is the sum of the communication estimates for arrays X and Y. From loop nest 1, the decomposition of X has a probability of reaching loop nest 4, and the decomposition of Y has a probablility. Figures 5(b) and (c) illustrate the final decompositions for this example, and are discussed in the next section.
Proof:We prove the dynamic decomposition problem NP-hard by transforming the known NP-hard problem, Colored Multiway Cut, into a subproblem of this problem. The Colored Multiway Cut problem is given a graph with weighted edges, and a partial k-coloring of the vertices, i.e., a subset and a function . Can f be extended to a total function such that the total weight of edges that have different colored endpoints is minimized? Consider the subproblem of the dynamic decomposition problem in which the program accesses only a single array X. If parallelized, each loop node has a value that is greater than the sum of the weights of all the edges; otherwise it has a value of 0.
We can reduce an instance of Colored Multiway Cut into an instance of our subproblem in polynomial time. We only give a brief overview of the reduction here. Each node in the original problem becomes a loop nest of depth k in our subproblem's input program, and the edge weights in G become branch probabilities. The array X is k-dimensional, where each dimension represents a color in G. We write the input program such that for each node of color i, only the ith loop is parallel in the loop nest representing v. For each node all loops in the corresponding loop nest are parallel.
After finding the dynamic decompositions, the edges that have communication correspond to the cutset of edges in the Colored Multiway Cut problem. If the edges are removed, then an array will have the same decomposition across all loop nests in a connected component of the communication graph.