Since C programs commonly access memory using type casts and other
features to override the high-level types, our algorithm is based on a
low-level representation of memory. This allows us to handle the
problematic features of C in a straightforward manner. Instead of
using types to identify locations that may contain pointers, we assume
that any memory location could potentially hold a pointer. We also
refer to locations within a block of memory by their positions, not by
their field names. We define a new abstraction, the location
set, to specify both a block of memory and a set of positions within
that block. For the most part, this low-level representation provides
results comparable to those based on high-level types. However, our
conservative approach may occasionally lead to less accurate results.
This is a price we are willing to pay in order to guarantee safe
results for all input programs
.
We divide memory into blocks of contiguous storage, whose positions relative to one another are undefined. A block of memory may be one of three kinds: a local variable, a locally allocated heap block, or an extended parameter. Global variables are treated as extended parameters. A special local variable represents the return value of a procedure. Note that the heap blocks are only those allocated within a procedure or its callees; heap blocks passed in from a calling context are considered to be extended parameters.
We distinguish heap blocks based on their allocation contexts, grouping together all the blocks allocated in each context. The minimal context information is simply the statement that creates the block. Including the call graph edges along which the new blocks are returned, eliminating duplicate edges in recursive cycles, can provide better precision for some programs [2]. While this scheme is a good starting point, we have found that for some programs it produces far more heap blocks than we would like. We are currently investigating techniques to merge heap blocks that are elements of the same recursive data structure in accordance with our goal to only distinguish complete data structures from one another. For now, we limit the allocation contexts to only include the static allocation sites. That is sufficient to provide good precision for the programs we have analyzed so far.