Replication of read-only data is a common technique used to improve the performance of parallel machines. Our algorithms find the amount of read-only data replication needed to maintain the degree of parallelism inherent in the read-write data without introducing additional communication.

We consider two types of replication: * constant replication* and
* dimension replication*.
Constant replication occurs when there are multiple data decompositions
for an array.
Dimension replication means that all processors along a given
dimension have a copy of the same data.
The increase in the space requirements to accommodate constant replication is
a linear function of the array size, whereas dimension replication
can cause the space needed to grow by the number of processors.
Thus we focus on dimension replication.

To allow the necessary replication,
we first run the decomposition algorithm * without* taking into
account the read data in the program.
From the resulting computation partition, we can then find
the data partitions of the read-only
arrays using Eqn. 5.
We must now find the processor dimensions which contain replicated data.

Dimension replication is modeled using a reduced processor
space, which is then expanded into the full **n**-dimensional
processor space.
All processors in the expanded dimensions have copies
of the data that are on the corresponding processor in the reduced space.
Let **x** be a replicated **m**-dimensional array, with data partition
.
The dimensionality of the reduced processor space, ,
is ,
where
is the array space accessed.
The * degree* of replication for an array
(the number of processor dimensions along which
the data is copied) is .
The data decomposition matrix for array **x**
is an decomposition matrix , which
maps array elements onto the reduced processor space.

Let be the computation decomposition matrix
for a loop nest **j** that accesses array **x**.
maps iterations onto the full processor space.
To relate the full processor space to the reduced processor space,
we use an matrix .
Eqn 3 from section 4.1 is modified to
express the relationship between computation and data with
replication:

The matrix maps onto the reduced space, and the nullspace of , , corresponds to dimensions along which there is replication.

The algorithm uses the data partition to find . In section 4.4, we used Eqn. 3 to find the data and computation decomposition matrices. Similarly, we use Eqn. 7 to find the decomposition matrices with replication.

The computation and data decompositions are initially derived without any consideration for the amount of replication needed. As a result, the amount of replication called for could be much greater than is practical on the target machine. Thus, we also use the techniques in section 7.1 to limit the degree of replication by projecting the virtual processor space onto a smaller processor space.

Fri Apr 7 14:39:58 PDT 1995