One of the most distinctive features of our algorithm is our conservative approach to handling all the features of the C language. Most of the previous work has made simplifying assumptions that rule out things such as pointer arithmetic, type casting, union types, out-of-bounds array references, and variable argument lists.
Our analysis is based on a points-to representation similar to the one described by Emami et al. . Other work has used alias pairs. Choi et al. show how alias pairs can be compactly represented using a transitive reduction strategy . In that compact form, the alias pairs are not much different than a points-to representation. There are some differences in precision between using full alias pairs and points-to functions, but neither is clearly superior . We have found that the points-to function is a compact representation that works well for analyzing C programs.
Our scheme for naming heap objects is taken directly from Choi et al. Most other work has used k-limiting, where some arbitrary limit is imposed on the length of pointer chains in recursive data structures . Although k-limiting can sometimes provide more information, our algorithm is not intended to distinguish the elements of recursive data structures. That problem has been addressed by a number of others .
Using symbolic names to represent the values of pointers passed into a procedure was first suggested by Landi and Ryder . They use ``non-visible variables'' to represent storage outside the scope of a procedure. Emami et al. use ``invisible variables'' for the same purpose. Our extended parameters are essentially the same as invisible variables, except that we choose to subsume parameters so that each initial points-to entry contains a single extended parameter.
Our work is most similar to the analysis developed by Emami et al. Their algorithm is based on an ``invocation graph'' which has a separate node for each procedure in each calling context. The size of an invocation graph is exponential in the depth of the call graph, implying that their algorithm is also exponential. Although our algorithm is still exponential in the worst case where every calling context has different aliases, it performs well for real input programs where the aliases are usually the same. We expect the precision of our results to be comparable to theirs, but there may be some differences. Our conservative handling of C occasionally causes us to propagate pointer values to more locations. This is the price we pay to get safe results for all inputs. Subsuming aliased parameters also gives up some precision to improve efficiency. On the other hand, our results may be better because we use the extended parameters to increase the number of strong updates.
Emami et al. mentioned the idea of using a memoization scheme similar to our algorithm. Our algorithm extends their design in several important ways. First, we parameterize references to global variables to increase the opportunities for reusing PTFs in different contexts. Second, we keep track of which pointers are actually referenced by a procedure. Thus, we avoid the overhead of creating and updating irrelevant parameters, and we prevent differing aliases among those irrelevant parameters from limiting the reuse.