So far the short-term memories that have been used to store the
intermediate results of a folded array have been left undescribed.
They could be implemented using standard RAM with the address
determined by the particular phase of data flow, but a much more
useful arrangement is possible. The short-term memory can actually be
viewed as a set of delay lines, with one delay line for each of the
outputs of a processor. An example of this is shown in
Figure 2.8. This figure shows a single-processor
emulation of the 4x4 array we introduced in
Figure 2.1. For ease of discussion, the processor
in this figure does not have an intrinsic delay.
The length of a given delay line can be easily determined. If the PEs in an origami array are numbered in a raster-scan fashion, as shown in Figure 2.9, the length of a delay line between the output of one PE and the input of another is simply the difference in the numbers of the two PEs. Our sample array requires two delay lines, one of length 4 and one of length 5. The length of the delay lines is approximately proportional to the width of the array, which is consistent with our earlier observation that only widthwise folding requires extra memory.