Pointer analysis promises significant benefits for optimizing and parallelizing compilers, yet despite much recent progress it has not advanced beyond the research stage. Several problems remain to be solved before it can become a practical tool. First, the analysis must be efficient without sacrificing the accuracy of the results. Second, pointer analysis algorithms must handle real C programs. If an analysis only provides correct results for well-behaved input programs, it will not be widely used. We have developed a pointer analysis algorithm that addresses these issues.
The goal of our analysis is to identify the potential values of the pointers at each statement in a program. We represent that information using points-to functions. We consider heap-allocated data structures as well as global and stack variables, but we do not attempt to analyze the relationships between individual elements of recursive data structures.
Interprocedural analysis is crucial for accurately identifying pointer values. Only very conservative estimates are possible by analyzing each procedure in isolation. One straightforward approach is to combine all the procedures into a single control flow graph, adding edges for calls and returns. An iterative data-flow analysis using such a graph is relatively simple but suffers from the problem of unrealizable paths. That is, values can propagate from one call site, through the callee procedure, and back to a different call site. Some algorithms attempt to avoid unrealizable paths by tagging the pointer information with abstractions of the calling contexts . However, these algorithms still inappropriately combine some information from different contexts.
Emami et al. have proposed a context-sensitive algorithm that completely reanalyzes a procedure for each of its calling contexts . This not only prevents values from propagating along unrealizable paths, but also guarantees that the analysis of a procedure in one calling context is completely independent of all the other contexts. A procedure may behave quite differently in each context due to aliases among its inputs, and a context-sensitive analysis keeps those behaviors separate. Reanalyzing for every calling context is only practical for small programs. For larger programs, the exponential cost quickly becomes prohibitive.
Interval analysis, which has been successfully used to analyze side effects for scalar and array variables in Fortran programs , is an approach that combines context sensitivity and efficiency. This technique summarizes the effects of a procedure by a transfer function. For each call site where the procedure is invoked, it computes the effects of the procedure by applying the transfer function to the specific input parameters at the call site. This provides context sensitivity without reanalyzing at every call site.
Interval analysis relies on being able to concisely summarize the effects of the procedures. Unfortunately, pointer analysis is not amenable to succinct summarization. The effects of a procedure may depend heavily on the aliases that hold when it is called. Thus, the evaluation of a transfer function that summarizes the pointer assignments in a procedure may be no simpler than running an iterative algorithm over the original procedure.
We propose a new technique that is completely context-sensitive yet does not sacrifice efficiency. Our approach is based on the insight that procedures are typically called with the same aliases among their inputs. Thus it is not necessary to completely summarize a procedure for all the potential aliases but only for those that occur in the program. Our idea is to generate incomplete transfer functions that only cover the input conditions that exist in the program. These incomplete transfer functions are made up of simple partial transfer functions (PTFs) that are only applicable to calling contexts that exhibit certain alias relationships. We have developed an efficient technique that isolates the set of relevant aliases upon which a PTF definition is based.
Our analysis embraces all the inelegant features of the C language that are hard to analyze but are commonly used. It safely handles arbitrary type casts, unions, and pointer arithmetic. It also assumes that pointers may be stored in any memory locations, regardless of the types declared for those locations. Since some of the standard library functions may change the values of pointers, we provide the analysis with a summary of the potential pointer assignments in each library function.
We have implemented our algorithm in the SUIF compiler system and have measured the analysis times for a set of benchmark programs. Our empirical results show that our technique is successful in that it often needs to generate only one PTF for each procedure in the program.
This paper is organized as follows. We first introduce the major concepts behind our approach and give an outline of our algorithm in Section 2.1. We then describe our representation of memory locations and pointer values in Section 3. Next in Section 4, we explain the intraprocedural portion of our algorithm. Section 5 then describes how we compute the effects of procedure calls. Finally, we discuss related work in Section 6 and present our experimental results in Section 7.