Our decomposition model is easily extended to incorporate the concept of tiling. In general, tiling creates two sets of loops: the inner loops iterate within the block and the outer loops iterate across the blocks. The inner loops are allocated to the same processor, while the outer loops are distributed across the processors. In this way, we achieve locality within the block, and parallelism across the blocks.
Mathematically we have represented
the computation that is allocated to the same processor
as a vector space
.
Focusing now on the loops within an inner block,
the iterations that are allocated to same processor
also form a vector space,
.
The vector space
is called the
localized vector space in [37], where
is used
to represent tile iterations that have cache locality.
In our model the localized vector space
contains all dimensions of the
iteration space that are
local to a processor, be they completely local or blocked.
Thus
.
Any dimension of the iteration space that is in
is blocked. Only the iterations within a finite block
are allocated to the same processor, not the entire dimension.
The blocks themselves are then distributed across the processors.
Similarly, we define a vector space
to characterize the array
dimensions within a block that are allocated to the same processor.
The relationship between the data partition
and space
is
.
In the ADI example, the blocked computation partitions are
and
.
Similarly, the blocked data partition is
and
.
Our algorithm finds the computation and data partitions
and
; these spaces correspond to those
dimensions that must be entirely mapped onto the same processor.
If blocking is desired, the algorithm also finds
and
;
the iterations in
, and the data in
are
distributed, but must be blocked.