Our decomposition model is easily extended to incorporate the concept of tiling. In general, tiling creates two sets of loops: the inner loops iterate within the block and the outer loops iterate across the blocks. The inner loops are allocated to the same processor, while the outer loops are distributed across the processors. In this way, we achieve locality within the block, and parallelism across the blocks.
Mathematically we have represented the computation that is allocated to the same processor as a vector space . Focusing now on the loops within an inner block, the iterations that are allocated to same processor also form a vector space, . The vector space is called the localized vector space in , where is used to represent tile iterations that have cache locality. In our model the localized vector space contains all dimensions of the iteration space that are local to a processor, be they completely local or blocked. Thus . Any dimension of the iteration space that is in is blocked. Only the iterations within a finite block are allocated to the same processor, not the entire dimension. The blocks themselves are then distributed across the processors.
Similarly, we define a vector space to characterize the array dimensions within a block that are allocated to the same processor. The relationship between the data partition and space is .
In the ADI example, the blocked computation partitions are and . Similarly, the blocked data partition is and .
Our algorithm finds the computation and data partitions and ; these spaces correspond to those dimensions that must be entirely mapped onto the same processor. If blocking is desired, the algorithm also finds and ; the iterations in , and the data in are distributed, but must be blocked.