# COMS W4117 Compilers and Translators: Software Verification Tools Lecture 7: Data-Flow Analysis Frameworks September 25, 2007

## Lecture Outline

1. Review
2. Data-flow analysis frameworks
3. Iterative algorithm for general frameworks
4. Meaning of a data-flow solution
5. Framework for constant propagation

## 1. Review

1. Available expressions
2. Basic questions about the iterative algorithm
3. Lattices and semilattices
4. Transfer functions

## 2. Data-Flow Analysis Frameworks

• Basic questions
• When is the iterative algorithm correct?
• How precise is the solution?
• Under what conditions does the iterative algorithm converge?
• What is the meaning of the solution to the data-flow equations?
• Definition of a data-flow analysis framework
• Data-flow graph with an ENTRY and EXIT node.
• Direction: forwards or backwards.
• Semilattice with values V for IN and OUT and a meet operator ∧.
• A family of transfer functions for the blocks of the data-flow graph.
• A constant value for the boundary condition.
• Monotone frameworks
• A framework is monotone (order preserving) if for all x and y in V and f in F, x ≤ y implies f(x) ≤ f(y).
• Or, equivalently, f(x ∧ y) ≤ f(x) ∧ f(y).
• Fixedpoints
• Let f be a function on a set V. An element x in V is a fixedpoint if f(x) = x.
• Let f be a monotone function. A maximum (least) fixedpoint of f is the greatest (least) element x such that f(x) = x.
• Knaster-Tarski theorem: If f is a monotone function on a complete lattice L, then the set of fixedpoints of f in L is also a complete lattice.
• If f is a monotone function on a complete lattice L, then the maximum fixedpoint of f is the limit of the sequence
•    f0 = TOP
fn+1 = f(fn)
• Distributive frameworks
• A framework is distributive if for all x and y in V and f in F, f(x ∧ y) = f(x) ∧ f(y).
• Every distributive framework is monotone (but not conversely).
• Reaching definitions, live variables, and available expressions are all distributive frameworks.

## 3. The Iterative Algorithm for General Frameworks

• Input: a data-flow analysis framework.
• Output: the sets IN[B] and OUT[B] for each block B in the data-flow graph.
• Method: The algorithm is given in Fig. 9.23, p. 627.
• Important properties of the algorithm:
• If the algorithm converges, the result is a solution to the data-flow equations.
• If the framework is monotone, the solution is the MFP of the data-flow equations.
• If the semilattice of the framework is monotone and of finite height, the algorithm is guaranteed to converge.

## 4. Meaning of a Data-Flow Solution

• For a forward data-flow framework, the IDEAL[B] solution for a block B is the meet over all feasible paths P of fP(vENTRY).
• Any answer greater than IDEAL is incorrect.
• Any answer smaller than or equal to IDEAL is safe.
• MOP[B], the meet over all paths solution, assumes that every path in the data-flow graph can be taken, but some of these paths may not be feasible. Thus, MOP[B] ≤ IDEAL[B] for all blocks B.
• The MFP solution is always smaller than or equal to the MOP solution. Therefore, MFP ≤ MOP ≤ IDEAL.

## 5. Framework for Constant Propagation

• Constant propagation, often called "constant folding", replaces expressions that always evaluate to the same constant by that constant.
• Constant-propagation framework
• The constant semilattice for a single variable has
1. All constants appropriate for the type of the variable.
2. The value NAC for not-a-constant. NAC is the bottom element.
3. The value UNDEF for undefined. UNDEF is the top element.
4. The following properties hold:
• UNDEF ∧ v = v for all values v.
• NAC ∧ v = NAC for all values v.
• c ∧ c = c for all constants c.
• c ∧ d = NAC for any two distinct constants c and d.
• Transfer functions
• A data-flow value for the framework is a map from each variable in the program to one of the values in the constant semilattice.
• We use m(v) to denote the lattice value of program variable v.
• The set F consists of transfer functions that accept a map of variables to values in the constant semilattice and return return another such map.
• Let fs be the transfer function for statement s, and let m and m' be data-flow values such that m' = fs(m).
• Forms for statement s:
• If s is x = c, then m'(x) = c.
• If s is x = y + z, then m'(x) =
• m(y) + m(z) if m(y) and m(z) are constant values
• NAC if either m(y) or m(z) is NAC
• UNDEF otherwise
• If the right side of statement s is anything else (e.g., function call, pointer expression), then m'(x) = NAC.
• This framework is monotone but not distributive.