So far we have only considered the base case of at most one outer sequential loop containing a number of perfectly nested loops. The Dynamic_Decomposition routine shown at the bottom of Figure 6 gives an overview of the decomposition algorithm in the general case.
Each nesting level is examined in a bottom-up order. This has the effect of pushing communication into the outermost loops as much as possible. The components are re-initialized at each level so that each loop nest is considered in the context of its sequential outer nest at the current level. The partitions found at each level are used to initialize the partitions for the next level. All references to loop indices that are outside the current nesting level are treated as symbolic constants when finding the partition constraints. In this manner, only the constraints for the current level are considered.
In the bipartite interference graph, there is an edge between each array node and each loop node that accesses the array. Each array node in the bipartite interference graph only has a single decomposition. Thus, if the dynamic algorithm discovers that an array's decomposition changes, the node corresponding to that array in the interference graph is split at all subsequent levels. The edges are adjusted so that loops using the reorganized decomposition now point to the proper array node.
Once the components have been formed, the algorithm finds the orientations and displacements. The orientations of the arrays and loops within a component are relative to one another; for example, no additional communication would result from transposing all the decompositions within a component. We use this observation to reduce the amount of communication when finding the orientations across components. Our compiler chooses an orientation for each component that matches as closely as possible to the other components that are connected by edges. Again, the compiler uses a greedy strategy based on the edge weights to decide which components to orient first. The orientations and displacements within a component are found using the algorithms described in Section 4.4 and Section 4.5, respectively.
The dynamic decomposition algorithm shown in Figure 6 is the driver algorithm for finding decompositions in the general case. It finds data and computation decompositions that maximize parallelism and minimize data reorganization and pipelined communication. The partitioning algorithms from the previous two sections are used as subroutines to the dynamic algorithm. In particular, if a static decomposition exists, then the dynamic algorithm will be able to successfully join all the loop nodes into a single component. In general, the algorithm reports a data decomposition for each array at each loop nest, and a computation decomposition for each loop nest.