This paper shows that control flow in a program can severely limit the available parallelism. The control flow of many non-numeric programs and also some numeric programs is complex and highly data dependent. To increase the available parallelism beyond the current level, the constraints imposed by control flow must be relaxed.
This paper discusses three basic techniques for handling control flow: speculative execution, control dependence analysis, and following multiple flows of control. Through a study of abstract machines that utilize different combinations of these techniques, we have established the importance of each technique. These basic techniques also form a useful set of criteria with which to evaluate real architectures.
This study suggests that some of the current highly parallel architectures lack adequate support for control flow. For example, a VLIW (Very Long Instruction Word) machine can only follow one flow of control. It cannot find sufficient parallelism in programs whose control flow is highly data dependent. In contrast, a dataflow machine can execute from many different parts of a program simultaneously. However, our study shows that even if instructions are executed as soon as all their data and control dependences are satisfied, the parallelism is still quite limited. Speculation is necessary to find sufficient parallelism in these programs.
This study of abstract machines also helps to identify useful architectural features. The concept of boosting , which relies on software for scheduling and a small degree of hardware to support speculative execution, appears particularly promising. Another interesting concept is guarded instructions . A guarded instruction is conditionally executed based on a value stored in a general register. This allows a compiler to specify some amount of control dependence information, that only the action is control dependent on the guard. Furthermore, using guarded instructions, a basic block can contain code from different conditional statements by simply capturing their conditions in the guards. Guarded instructions are particularly interesting when combined with support for speculative execution, since they help increase the distance between mispredicted branches. Though guarded instructions give the processor some ability to execute from different regions of the source code, they are inefficient for following multiple complex flows of control simultaneously. If higher performance is desired, a small-scale multiprocessor system with guarded instructions and speculative execution support would be an interesting possibility.