A sample origami array is shown in Figure 2.1. This example consists of a 4x4 array of PEs, each with two inputs and two outputs. Each PE is connected to the PE directly below, and to the PE directly below and to the right. Neither the computations that can be performed by the PEs (called flavors), nor the size of the inputs and outputs have been specified. Each PE could be a simple logic gate or a very complex logic function, and each input or output could be one bit wide, eight bits wide, or hundreds of bits wide. In fact, each PE could be a supercomputer with the connection channels containing gigabits worth of information. Within certain limitations, the properties discussed below are valid for any size PE and I/O channel.
In the example in Figure 2.1, data flows ``down''
the array. All inputs enter at the top, and all outputs exit at the
bottom. We will ignore the boundary conditions at the left and right
sides for the moment. The processor array is perfectly
pipelined and synchronous. That is, there is a delay
between each level in the array in the direction of data flow, and
each PE is allocated exactly one ``clock cycle'' to perform its
computation. When a set of input data arrives at the top of the
array, it takes seconds for the result to arrive at
the outputs, where
is the time taken by a single pipelined
stage. While the latency is dependent on the height of the array, the
throughput is constant at
.