Our goal with regard to aggregates is to distinguish between different fields within a structure but not the different elements of an array. That is, given an array of structures with fields x and y, the locations are partitioned into two sets, one containing all field x data and one containing all field y data. Our pointer analysis can be combined with data dependence tests to distinguish between different array elements.
We represent positions within a block of storage using location sets. A location set is a triple where the base is the name for a block of memory, is an offset within that block, and is the stride. That is, it represents the set of locations within block , as shown graphically in Figure 5. The offsets and strides are measured in bytes.
Table 1 shows the location sets that represent various expressions. A field in a structure is identified by its offset from the beginning of the structure. Except for array references, the stride is unused and is set to zero. A reference to an array element has a stride equal to the element size. If the elements of an array are structures, then the field information is captured by the offset. Since C does not provide array bounds checking, a reference to an array nested within a structure can access any field of the structure by using out-of-bounds array indices. Although such out-of-bounds references are rare and non-standard, we believe it is still important to handle them conservatively. Thus we treat an array nested in a structure as if it completely overlaps the entire structure. This implies that the offset will always be less than the stride. We enforce that by always computing the offset modulo the stride whenever the stride is non-zero.
When the position of a location within a block is entirely unknown, we set the location set stride to one. This means that the location set includes all the locations within the block. This may occur due to pointer arithmetic. Although we recognize simple pointer increments, which are commonly used to address array elements, to determine the location set strides, we do not attempt to evaluate more complex pointer arithmetic. Instead, for each memory address input to an arithmetic expression, we add to the result a location set with the same base object but with the stride set to one. This conservatively approximates the results of any arithmetic expression. Because the positions of the blocks relative to one another are undefined, we need not worry about pointer arithmetic moving a pointer to a different block.
In summary, our location set representation has several advantages. Problems related to type casts and unions become irrelevant because we do not use the high-level types. It is also very easy to work with, especially when dealing with pointer arithmetic.