To establish the fundamental limits of these three techniques for relaxing control flow constraints, we define a set of abstract machine models and analyze the parallelism for each machine under ideal conditions. Each machine model uses a different combination of the three techniques. Our approach is to examine a set of instruction traces from real programs and compute the available parallelism by simply enforcing true data dependence constraints and the control flow constraints associated with each abstract machine model. Other constraints due to imperfect memory disambiguation, reuse of variables, and limited resources are ignored.
We define seven abstract machines. The BASE machine uses none of the three techniques and provides a baseline for comparison. The ORACLE machine has perfect branch prediction, and its performance represents an upper bound of parallelism given the assumptions in our experiments. The other five machines use combinations of control dependence analysis (CD), following multiple flows of control (MF), and speculative execution with branch prediction (SP). There are only five interesting combinations because we cannot recognize independent flows of control without control dependence analysis.
Each machine is distinguished by its ability to handle control flow. We model these abilities by imposing the sequencing constraints shown in Figure 1 on the execution of instructions in dynamic traces. More aggressive machines have less restrictive constraints. Since the BASE machine uses none of the three techniques, its control flow constraint is the most severe; it prevents instructions from executing before any preceding branches. The CD machine has perfect control dependence information, and thus, instructions that are not dependent upon a branch need not wait for it to be resolved. To reflect the limitation of following one flow of control, we also impose an ordering on the branch instructions. To model the behavior of current compilers , this ordering requires all branches in the program to execute in the original sequential execution order. The CD-MF machine may follow an unlimited number of flows of control and does not require a branch ordering constraint.
All of the machines with speculative execution in this study only speculatively execute instructions on the most likely execution path. Simultaneously executing instructions on alternate paths would require that some instructions be cancelled regardless of the branch outcomes. However, in the various SP machines, all of the speculative instructions are potentially useful, making these machines more realistic than the ORACLE machine.
The SP machine can speculate on an infinite number of consecutive branch outcomes. This essentially creates an infinitely long path of predicted instructions through a program. Instructions from anywhere along this path may execute in parallel. However, when a branch is mispredicted, all of the instructions on the predicted path following the branch must be cancelled. Only those instructions that are not cancelled appear in the traces that we analyze. Therefore, the control flow constraint on an instruction in a trace is that it cannot execute until all of the preceding mispredicted branches are resolved. As long as the branch predictions are correct, the flow of control does not change and the branch instructions can execute in any order. A mispredicted branch requires that the flow of control transfer to the unpredicted branch path, and thus, only one mispredicted branch can execute in each cycle.
The SP-CD machine differs from the SP machine in its treatment of mispredicted branches. Instead of cancelling all of the instructions along the predicted path, only those instructions that are actually control dependent on the mispredicted branch are cancelled. Due to following a single flow of control, the mispredicted branches must still be executed in order. The SP-CD-MF machine can follow multiple flows of control simultaneously, and thus mispredicted branches can execute in parallel.
To illustrate the power of the different abstract machine models, consider the flow graph in Figure 2. Assume that there are no data dependences in this program. Each node in the graph consists of a single instruction. Next to each node is the set of branch instructions on which it is immediately control dependent. The edges represent the possible flow of control, with the more likely branch outcomes highlighted by bold arcs. One possible trace through this graph is also shown in Figure 2. Each instruction in the trace is identified by the instruction node number and a letter to distinguish the specific instance. Branch instructions are written in boldface and circled if they are mispredicted. The edges in the trace represent the control dependence relationships.
Since there are no data dependences in the program, the ORACLE machine simply executes all of the instructions in one cycle. The executions of the other machine models are shown in Figure 3, where the edges represent the dependences due to control flow and instructions at the same level execute at the same time.
The BASE machine executes all of the branches sequentially and each non-branch instruction one cycle after the preceding branch. The CD machine executes instructions 6 and 7 earlier because they are not control dependent on the immediately preceding branches. The CD-MF machine need not execute the branches in order; the edges are simply the control dependence edges from the trace in Figure 2. The SP machine executes all of the instructions between mispredicted branches in parallel. The SP-CD machine behaves similarly but executes instructions 3c, 6, and 7 earlier because they are not control dependent on the mispredicted branches. Finally, the SP-CD-MF machine executes all but one of the instructions in one cycle. Since instruction 4b is not on the predicted path, the machine does not speculatively execute the instruction. The instruction is therefore not executed until the processor discovers that it has mispredicted the branch 2b. In contrast, to achieve the performance of the ORACLE machine, both instructions 3 and 4 must be executed in parallel on every iteration. This illustrates the fundamental difference between SP-CD-MF and ORACLE: the SP-CD-MF machine does not pursue alternate paths simultaneously.