No communication will occur when the data is local to the processor that references that data. This relationship between data and computation is expressed by the following theorem.
Communication at the displacement level is inexpensive since
the amount of data transferred can be significantly reduced by blocking.
Thus the priority is in finding the best partitions and orientations,
and we first focus on the version of Eqn. 2 that
omits displacements.
Letting
,
Given the array index function matrices
for each array x in
each loop nest j,
a decomposition is free of reorganization communication if
the data decomposition matrix
and the computation decomposition
are such that Eqn. 3 is true.
A trivial solution that guarantees no communication is to execute everything
sequentially by
setting all computation decomposition matrices C = 0 and
all data decomposition matrices D = 0.
Therefore
would span the entire iteration space
and
would span the entire array space
.
However, maximizing parallelism means finding data and computation
decompositions such that the partition
, the nullspace of C,
for all loop nests is as small as possible.