Replication of read-only data is a common technique used to improve the performance of parallel machines. Our algorithms find the amount of read-only data replication needed to maintain the degree of parallelism inherent in the read-write data without introducing additional communication.
We consider two types of replication: constant replication and dimension replication. Constant replication occurs when there are multiple data decompositions for an array. Dimension replication means that all processors along a given dimension have a copy of the same data. The increase in the space requirements to accommodate constant replication is a linear function of the array size, whereas dimension replication can cause the space needed to grow by the number of processors. Thus we focus on dimension replication.
To allow the necessary replication, we first run the decomposition algorithm without taking into account the read data in the program. From the resulting computation partition, we can then find the data partitions of the read-only arrays using Eqn. 5. We must now find the processor dimensions which contain replicated data.
Dimension replication is modeled using a reduced processor space, which is then expanded into the full n-dimensional processor space. All processors in the expanded dimensions have copies of the data that are on the corresponding processor in the reduced space. Let x be a replicated m-dimensional array, with data partition . The dimensionality of the reduced processor space, , is , where is the array space accessed. The degree of replication for an array (the number of processor dimensions along which the data is copied) is . The data decomposition matrix for array x is an decomposition matrix , which maps array elements onto the reduced processor space.
Let be the computation decomposition matrix for a loop nest j that accesses array x. maps iterations onto the full processor space. To relate the full processor space to the reduced processor space, we use an matrix . Eqn 3 from section 4.1 is modified to express the relationship between computation and data with replication:
The matrix maps onto the reduced space, and the nullspace of , , corresponds to dimensions along which there is replication.
The algorithm uses the data partition to find . In section 4.4, we used Eqn. 3 to find the data and computation decomposition matrices. Similarly, we use Eqn. 7 to find the decomposition matrices with replication.
The computation and data decompositions are initially derived without any consideration for the amount of replication needed. As a result, the amount of replication called for could be much greater than is practical on the target machine. Thus, we also use the techniques in section 7.1 to limit the degree of replication by projecting the virtual processor space onto a smaller processor space.