Next: Control Dependence Analysis Up: Limits of Control Previous: Speculative Execution

Evaluation and Analysis

The parallelism for each machine model is shown in Table 3. The following sections examine these results for the non-numeric benchmarks and discuss the implications. The results for the numeric benchmarks are presented in Section 5.3, and Section 5.4 discusses the effects of perfect loop unrolling.

The BASE machine provides a standard for comparison by determining the amount of parallelism when no special effort is made to reduce the control flow constraints. For the non-numeric benchmarks, the BASE machine has a harmonic mean parallelism of 2.14, which is larger than in other studies of similar machines because our assumptions are very different. First, the BASE machine allows some instruction scheduling across basic blocks. An instruction can execute as soon as the previous conditional branch is resolved, even if other instructions before the branch have not completed. In addition to this overlap, basic blocks that are not separated by conditional branches may also be executed in parallel. Second, whereas all operations in our machines execute in only one clock cycle, some previous studies used realistic operation latencies. Since non-unit latency operations consume some parallelism to fill pipeline bubbles, the reported speedups do not measure all of the parallelism. Finally, we do not include any limitations on fetching instructions [14].

At the other extreme is the upper bound of the ORACLE machine. As expected, the amount of parallelism is quite large and varies significantly between benchmarks. This reflects the different types of algorithms in the programs. For example, eqntott primarily executes a quicksort function which contains few data dependences. On the other hand, ccom and latex are composed of algorithms with much less inherent parallelism. Several factors cause our numbers to be considerably larger than in previous experiments with perfect branch prediction. Our unlimited scheduling window exposes parallelism across the entire program trace. Anti-dependences and output data dependences are not considered. Our simulation of procedure inlining also removes the instructions that adjust the stack pointer at the entry and exit of most procedures. This is significant in the case of the ORACLE machine because the stack pointer increments and decrements often lengthen the critical path of a program.

Next: Control Dependence Analysis Up: Limits of Control Previous: Speculative Execution

Bob Wilson