With ever-increasing clock rates and the use of instruction-level parallelism, the speed of microprocessors has and will continue to increase dramatically. With numerical processing capabilities that rival the processing power of older generation supercomputers, these microprocessors are particularly attractive as scientific engines due to their cost-effectiveness. In addition, these processors can be used to build large-scale multiprocessors capable of an aggregate peak rate surpassing that of current vector machines.
Unfortunately, a high computation bandwidth is meaningless unless it is matched by a similarly powerful memory subsystem. These microprocessors tend to rely on caches to reduce their effective memory access time. While the effectiveness of caches has been well established for general-purpose code, their effectiveness for scientific applications has not. One manifestation of this is that several of the scalar machines designed for scientific computation do not use caches.