Since C programs commonly access memory using type casts and other features to override the high-level types, our algorithm is based on a low-level representation of memory. This allows us to handle the problematic features of C in a straightforward manner. Instead of using types to identify locations that may contain pointers, we assume that any memory location could potentially hold a pointer. We also refer to locations within a block of memory by their positions, not by their field names. We define a new abstraction, the location set, to specify both a block of memory and a set of positions within that block. For the most part, this low-level representation provides results comparable to those based on high-level types. However, our conservative approach may occasionally lead to less accurate results. This is a price we are willing to pay in order to guarantee safe results for all input programs.
We divide memory into blocks of contiguous storage, whose positions relative to one another are undefined. A block of memory may be one of three kinds: a local variable, a locally allocated heap block, or an extended parameter. Global variables are treated as extended parameters. A special local variable represents the return value of a procedure. Note that the heap blocks are only those allocated within a procedure or its callees; heap blocks passed in from a calling context are considered to be extended parameters.
We distinguish heap blocks based on their allocation contexts, grouping together all the blocks allocated in each context. The minimal context information is simply the statement that creates the block. Including the call graph edges along which the new blocks are returned, eliminating duplicate edges in recursive cycles, can provide better precision for some programs . While this scheme is a good starting point, we have found that for some programs it produces far more heap blocks than we would like. We are currently investigating techniques to merge heap blocks that are elements of the same recursive data structure in accordance with our goal to only distinguish complete data structures from one another. For now, we limit the allocation contexts to only include the static allocation sites. That is sufficient to provide good precision for the programs we have analyzed so far.