Compilers and Translators:

Software Verification Tools

Lecture 7: Data-Flow Analysis Frameworks

September 25, 2007

- Review
- Data-flow analysis frameworks
- Iterative algorithm for general frameworks
- Meaning of a data-flow solution
- Framework for constant propagation
- Reading

- Available expressions
- Basic questions about the iterative algorithm
- Lattices and semilattices
- Transfer functions

- Basic questions
- When is the iterative algorithm correct?
- How precise is the solution?
- Under what conditions does the iterative algorithm converge?
- What is the meaning of the solution to the data-flow equations?
- Definition of a data-flow analysis framework
- Data-flow graph with an ENTRY and EXIT node.
- Direction: forwards or backwards.
- Semilattice with values V for IN and OUT and a meet operator ∧.
- A family of transfer functions for the blocks of the data-flow graph.
- A constant value for the boundary condition.

- Monotone frameworks
- A framework is monotone (order preserving) if for all x and y in V and f in F, x ≤ y implies f(x) ≤ f(y).
- Or, equivalently, f(x ∧ y) ≤ f(x) ∧ f(y).
- Fixedpoints
- Let f be a function on a set V. An element x in V is a fixedpoint if f(x) = x.
- Let f be a monotone function. A maximum (least) fixedpoint of f is the greatest (least) element x such that f(x) = x.
- Knaster-Tarski theorem: If f is a monotone function on a complete lattice L, then the set of fixedpoints of f in L is also a complete lattice.
- If f is a monotone function on a complete lattice L, then the maximum fixedpoint of f is the limit of the sequence
- f
_{0}= TOP - f
_{n+1}= f(f_{n}) - Distributive frameworks
- A framework is distributive if for all x and y in V and f in F, f(x ∧ y) = f(x) ∧ f(y).
- Every distributive framework is monotone (but not conversely).
- Reaching definitions, live variables, and available expressions are all distributive frameworks.

- Input: a data-flow analysis framework.
- Output: the sets IN[B] and OUT[B] for each block B in the data-flow graph.
- Method: The algorithm is given in Fig. 9.23, p. 627.
- Important properties of the algorithm:
- If the algorithm converges, the result is a solution to the data-flow equations.
- If the framework is monotone, the solution is the MFP of the data-flow equations.
- If the semilattice of the framework is monotone and of finite height, the algorithm is guaranteed to converge.

- For a forward data-flow framework, the IDEAL[B] solution for a block B is the meet over all
feasible paths P of f
_{P}(v_{ENTRY}). - Any answer greater than IDEAL is incorrect.
- Any answer smaller than or equal to IDEAL is safe.
- MOP[B], the meet over all paths solution, assumes that every path in the data-flow graph can be taken, but some of these paths may not be feasible. Thus, MOP[B] ≤ IDEAL[B] for all blocks B.
- The MFP solution is always smaller than or equal to the MOP solution. Therefore, MFP ≤ MOP ≤ IDEAL.

- Constant propagation, often called "constant folding", replaces expressions that always evaluate to the same constant by that constant.
- Constant-propagation framework
- The constant semilattice for a single variable has
- All constants appropriate for the type of the variable.
- The value NAC for not-a-constant. NAC is the bottom element.
- The value UNDEF for undefined. UNDEF is the top element.
- The following properties hold:
- UNDEF ∧ v = v for all values v.
- NAC ∧ v = NAC for all values v.
- c ∧ c = c for all constants c.
- c ∧ d = NAC for any two distinct constants c and d.
- Transfer functions
- A data-flow value for the framework is a map from each variable in the program to one of the values in the constant semilattice.
- We use m(v) to denote the lattice value of program variable v.
- The set F consists of transfer functions that accept a map of variables to values in the constant semilattice and return return another such map.
- Let f
_{s}be the transfer function for statement s, and let m and m' be data-flow values such that m' = f_{s}(m). - Forms for statement s:
- If s is x = c, then m'(x) = c.
- If s is x = y + z, then m'(x) =
- m(y) + m(z) if m(y) and m(z) are constant values
- NAC if either m(y) or m(z) is NAC
- UNDEF otherwise
- If the right side of statement s is anything else (e.g., function call, pointer expression), then m'(x) = NAC.
- This framework is monotone but not distributive.

- ALSU: sections 9.3, 9.4

aho@cs.columbia.edu