The CD machine has more parallelism than the BASE machine because basic blocks with the same control dependences can be executed in parallel. However, the harmonic mean parallelism of 2.39 for the CD machine is only slightly better than for the BASE machine. Figure 4 shows the parallelism for each benchmark compared to the BASE machine. The parallelism for the CD machine is primarily limited by the constraint that branches must be executed in order. Since conditional branches occur frequently in our benchmarks, executing one branch at a time is a serious bottleneck. Table 2 shows the average number of dynamic instructions between conditional branch instructions in the program traces. For the non-numeric programs, a branch instruction occurs about every six instructions in a trace. When all of these branches are ordered, it is difficult to find much parallelism.
When the constraint on branches is removed in the CD-MF machine, only the true control and data dependences must be observed. The parallelism for each benchmark is shown in comparison to the parallelism for the CD machine in Figure 4. The parallelism increases for all of the programs, and especially for gcc, irsim, and espresso. However, there is still not a massive amount of parallelism. This is really not too surprising when one considers the types of benchmarks that we are analyzing. There may be some parallelism within individual components of these programs, but the overall algorithms are simply not very parallel.
Since the constraints for the CD-MF machine only require that true data and control dependences be observed, the parallelism for this machine is a limit for all systems without speculative execution. Dataflow architectures, for example, are able to execute programs with only these essential dependences. Since there are not massive amounts of parallelism, any machine attempting to exploit parallelism in non-numeric programs without speculative execution must have low overhead to be effective.