The parallelism for the SP machine ranges from 4.16 to 9.22, with a harmonic mean of 6.80. Figure 5 shows the parallelism for each benchmark compared to the BASE machine. These results are comparable to Wall's results for a similar machine . The differences can be attributed to procedure inlining, perfect loop unrolling, and the unlimited scheduling window in our simulator. The parallelism for the SP machine is fairly consistent across the different benchmarks. The following measurements offer an explanation for the consistency and suggest that this limit of parallelism will probably apply to many more non-numeric applications.
In the SP machine, a misprediction cancels the execution of all instructions following the branch, so mispredictions are barriers to instruction scheduling. Parallelism can only be found among the instructions between mispredicted branches. Therefore, the overall limit of parallelism for the SP machine is actually an average over many discrete segments of code separated by mispredicted branches. Each of these segments has two vital characteristics: the degree of parallelism and the misprediction distance, that is, the number of instructions in the segment. In our experiments, we recorded the number of occurrences of each misprediction distance. The cumulative distributions of these misprediction distances for each program are shown in Figure 6. These distributions are quite consistent across the different benchmarks, with over 80%of the mispredictions occurring within a distance of 100 instructions. We expect other non-numeric programs to have similar distributions.
We also recorded the degree of parallelism for each segment of code between two mispredicted branches and found that the relationship between the degree of parallelism and the misprediction distance is similar for all of the benchmarks. Figure 7 is a combination of the statistics for all of the programs. For each misprediction distance, we plot the harmonic mean of the parallelism for all segments of that size. To reflect differences in the significance of these numbers, the bars for frequently occurring misprediction distances are shaded darker. For short misprediction distances, there is little parallelism. Instructions within these short segments tend to be closely related and have many data dependences between them, and thus the parallelism is limited. For longer misprediction distances, there is a greater chance of having unrelated instructions within the segments and more parallelism can be found. However, as shown by the distributions, long misprediction distances do not occur very frequently. Therefore, non-numeric programs with predominantly short misprediction distances have limited parallelism on the SP machine due to the data dependence in short segments of instructions.
The SP-CD machine does not need to discard all instructions following a mispredicted branch, and thus it can exploit parallelism across mispredicted branches. As a result, the harmonic mean parallelism for this machine increases to 13.27. Figure 5 compares the parallelism for each benchmark to the parallelism for the SP machine. The branch constraint for this machine requires that a branch cannot execute before a preceding misprediction. This is much less restrictive than the branch constraint for the CD machine. The flow of control only changes when a branch is mispredicted, and since mispredictions are relatively infrequent, the branch constraint is not a bottleneck.
Finally, the parallelism for the SP-CD-MF machine is much larger. Figure 5 illustrates the parallelism for each benchmark. The eqntott and espresso programs have especially large amounts of parallelism, and there is a large increase for all of the benchmarks.
The SP-CD-MF model provides us with an interesting data point. It is more aggressive than the SP machine but also more realistic than the ORACLE machine. To achieve the performance of the ORACLE machine, instructions from alternate paths must be executed simultaneously, and only the instructions from one of the paths will be useful. On the other hand, as long as branches are correctly predicted, the SP-CD-MF machine does not have to cancel any instructions.