We now use the code shown in Figure 1(a) to illustrate our algorithm. Because of the data dependence carried by the DO 20 J loop, iterations of this loop must execute on the same processor in order for there to be no communication. Starting with this constraint and applying Equation 1, the compiler finds that the second dimension of arrays A,B and C must also be allocated to the same processor. Equation 1 is applied again to find that the DO 10 J must also run on the same processor. Finally, the folding function for this example is BLOCK as selected by default. The final data decompositions for the arrays are DISTRIBUTE(BLOCK, *).