Next: Algorithm Outline Up: Major Concepts Previous: Partial Transfer Functions

Design of PTFs

There is a trade-off between the complexity of the individual PTFs and their applicability. By making a PTF more complicated, we can increase the size of its input domain so that fewer PTFs need to be computed. The complete transfer function, which covers all the possible inputs, is at one end of this trade-off. At the other extreme, each point in the input space can have a separate PTF. One of these PTFs can only be reused when all of its inputs have exactly the same values as in the context where it was created. The initial values specify the input domain for the PTF. Whenever the analysis encounters another call to the procedure with the same input values, it can reuse the final values recorded in the PTF. This technique is commonly known as memoization.

Since it is not the specific input values but their alias patterns that determine the behavior of a procedure, we use symbolic names called extended parameters to abstract the input values. An extended parameter represents the locations reached through an input pointer at the beginning of a procedure. Every object is represented by at most one extended parameter. This is similar to the ``invisible variables'' defined by Emami et al. [6].

For procedure f in Figure 1, we use the extended parameter 1_p to represent the location initially pointed to by p; 1_p represents x0 for the call at S1 and z0 for the call at S2. For the calls at S1 and S2 in Figure 1, since none of the inputs are aliased, we create a new extended parameter for every location accessed. For the call at S3 on the other hand, both p and r point to the same location, so we create only one extended parameter to represent the common location.

When a pointer to a global is passed into a procedure, we treat it as any other extended parameter. We do not necessarily want to know the specific value of the pointer. For example, for the call at S1 in Figure 1, the parameter 1_p represents the global variable x0. If we used x0 directly instead of the parameter, we would not be able to reuse the same PTF for the call at S2. Since extended parameters represent global variables referenced through pointers, we also use extended parameters to represent global variables that are referenced directly. If a global is referenced both directly and through a pointer input to a procedure, using the same extended parameter for both references takes care of the alias between them.

It is important to only create the extended parameters that are relevant to the procedure. Aliases involving parameters that are never referenced should not prevent us from reusing PTFs. Our solution is to create the extended parameters lazily. We only create extended parameters as they are referenced. In contrast, Emami et al. create invisible variables for all input pointers that could potentially be accessed. Not only does that require unnecessary overhead to create parameters that are never referenced, but it also limits the opportunities for reuse.

Extended parameters play three important roles:

  1. Extended parameters make up part of the name space of a procedure. Each procedure has its own distinct name space which consists of the extended parameters, local variables, and heap storage allocated by the procedure and its children. We derive a parameter mapping at each call site to map the caller's name space to the callee's and vice versa.

  2. The initial points-to function for a PTF specifies the input domain. The aliases among the inputs to a PTF are the main part of the input domain specification. With our definition of extended parameters, the initial points-to function succinctly captures that information. For the alias in the call at S3 in the example, the initial points-to function records that p and r point to the same extended parameter 1_p as shown in Figure 4(a). Similarly, the totally disjoint points-to relationships in Figure 3(a) reflect the lack of aliases for the other calls.

  3. The final points-to function at the procedure exit summarizes the pointer assignments. Given the initial points-to function, it is relatively easy to derive the points-to function at the procedure exit. Figures 3(b) and 4(b) show the final points-to functions produced for the corresponding inputs. Since the analysis operates on the parameterized name space, the final points-to function summarizes the procedure parametrically. The parameter mapping for a particular call site allows us to translate the summary back to the caller's name space.

Besides the aliases, the input domain of a PTF is also defined by the values of function pointers passed in and used in calls within the procedure. The function pointer values affect the PTF summary because they determine what code could be executed.

The PTF design presented here is but one choice in the trade-off between the complexity of the PTFs and the sizes of their input domains. We have also explored another scheme that uses a separate extended parameter for each access path. That design increases reuse with considerable cost in complexity, since multiple extended parameters may then refer to the same location. Furthermore, it requires additional analysis to abstract the potentially infinite access paths to a finite set. Our experience with that scheme suggests that such complexity is unnecessary for this analysis.

Next: Algorithm Outline Up: Major Concepts Previous: Partial Transfer Functions

Bob Wilson