Compilers and Translators:

Software Verification Tools

Lecture 5: Data-Flow Analysis

September 18, 2007

- Review
- Introduction to data-flow analysis
- Data-flow schemas
- Reaching definitions
- Live-variable analysis
- Reading

- Flow graphs and their representation
- Loops
- Introduction to code optimization

- Data-flow analysis
- Data-flow analysis refers to a collection of techniques used to collect information about the flow of data along all possible execution paths of a program
- The execution of a program is viewed as a series of transformations on the state of the program. Each execution of an IR statement transforms an input state to an output state, where the input state is associated with the program point before the statement and the output state with the program point after the statement.
- In each application of data-flow analysis we associate with every program point a data-flow value that represents an abstraction of the set of all program states that can be observed for that point.
- We denote the data-flow values before and after each statement s by IN[s] and OUT[s], respectively.
- The data-flow problem is to find a solution for the IN and OUT sets for all statements subject to the constraints imposed by the semantics of the statements ("transfer functions") and those based on the flow of control.

- For forward data-flow problems
- For a basic block B, OUT[B] = f
_{B}(IN[B]), where f_{B}is the composition of the transfer functions for the individual statements in B. - The constraint for IN[B] is that IN[B] is the union of OUT[P] over all predecessors P of B.
- For backward data-flow problems
- IN[B] = f
_{B}(OUT[B]). - The constraint for OUT[B] is that OUT[B] is the union of IN[S] over all successors P of B.

- Reaching definitions is the prototypical forward data-flow analysis problem.
- A definition of a variable x is a statement that assigns, or may assign, a value to x. Because of procedure parameters, arrays, and indirect accesses it is not always easy to tell if a statement is referring to a particular variable x.
- A definition d reaches a point p if there is a path from the point immediately following d to p such that d is not killed along that path.
- We kill a definition of a variable x if there is another definition of x anywhere along the path (which may have loops in it).
- Example: the definition d: u = v+w
*generates*a definition d of variable u and*kills*all the other definitions in the program that define variable u. - Transfer equations for reaching definitions, pp. 603-605.
- Control-flow equations, p. 605.
- Iterative algorithm for reaching definitions: See Alg. 9.11, p. 606.

- Live-variable analysis is the prototypical backward data-flow analysis problem.
- In live-variable analysis we want to know for variable x and point p
whether the value of x at p could be used along some path in the flow graph
starting at p. If so, we say x is
*live*at p; otherwise, x is*dead*at p. - Live-variable analysis is useful for register allocation.
- We define def
_{B}as the set of variables defined in block B prior to any use of that variable in B. - We define use
_{B}as the set of variables whose values may be used in B prior to any definition of that variable. - Transfer equations and control-flow equations for live-variable analysis, pp. 608-610.
- Control-flow equations, p. 605.
- Iterative algorithm for live-variable analysis: See Alg. 9.14, p. 610.

- ALSU: section 9.2
- See http://mtc.epfl.ch/software-tools/blast/, http://www.cs.indiana.edu/~lepike/talks/sal.pdf and http://www.cs.cmu.edu/~chaki/magic/ for more suggestions of software verification tools.

aho@cs.columbia.edu