We now present results from our simulation studies. We start by evaluating the effectiveness of our compiler algorithm, including the key aspects of locality analysis, loop splitting and software pipelining. We evaluate the sensitivity of our performance results to variations in the architectural parameters used by the compiler. We then compare two different architectural policies for handling situations when the memory subsystem is saturated with prefetch requests and cannot accept any more. Finally, we explore the interaction between prefetching and locality optimizations such as cache blocking.