Up: Design and Evaluation of
Previous: Acknowledgments
References
- 1
-
W. Abu-Sufah, D. J. Kuck, and D. H. Lawrie.
Automatic program transformations for virtual memory computers.
Proc. of the 1979 National Computer Conference, pages 969-974,
June 1979.
- 2
-
J-L. Baer and T-F. Chen.
An effective on-chip preloading scheme to reduce data access penalty.
In Proceedings of Supercomputing '91, 1991.
- 3
-
D. Bailey, J. Barton, T. Lasinski, and H. Simon.
The NAS Parallel Benchmarks.
Technical Report RNR-91-002, NASA Ames Research Center, August 1991.
- 4
-
D. Callahan, K. Kennedy, and A. Porterfield.
Software prefetching.
In Proceedings of the Fourth International Conference on
Architectural Support for Programming Languages and Operating Systems, pages
40-52, April 1991.
- 5
-
W. Y. Chen, S. A. Mahlke, P. P. Chang, and W. W. Hwu.
Data access microarchitectures for superscalar processors with
compiler-assisted data prefetching.
In Proceedings of Microcomputing 24, 1991.
- 6
-
R. P. Colwell, R. P. Nix, J. J. O'Donnell, D. B. Papworth, and P. K. Rodman.
A vliw architecture for a trace scheduling compiler.
In Proc. Second Intl. Conf. on Architectural Support for
Programming Languages and Operating Systems, pages 180-192, Oct. 1987.
- 7
-
J. C. Dehnert, P. Y.-T. Hsu, and J. P. Bratt.
Overlapped loop support in the cydra 5.
In Third International Conference on Architectural Support for
Programming Languages and Operating Systems (ASPLOS III), pages 26-38,
April 1989.
- 8
-
J. Ferrante, V. Sarkar, and W. Thrash.
On estimating and enhancing cache effectiveness.
In Fourth Workshop on Languages and Compilers for Parallel
Computing, Aug 1991.
- 9
-
K. Gallivan, W. Jalby, U. Meier, and A. Sameh.
The impact of hierarchical memory systems on linear algebra algorithm
design.
Technical Report UIUCSRD 625, University of Illinios, 1987.
- 10
-
D. Gannon and W. Jalby.
The influence of memory hierarchy on algorithm organization:
Programming FFTs on a vector multiprocessor.
In The Characteristics of Parallel Algorithms. MIT Press, 1987.
- 11
-
D. Gannon, W. Jalby, and K. Gallivan.
Strategies for cache and local memory management by global program
transformation.
Journal of Parallel and Distributed Computing, 5:587-616,
1988.
- 12
-
G. H. Golub and C. F. Van Loan.
Matrix Computations.
Johns Hopkins University Press, 1989.
- 13
-
E. Gornish, E. Granston, and A. Veidenbaum.
Compiler-Directed Data Prefetching in Multiprocessors with Memory
Hierarchies.
In International Conference on Supercomputing, 1990.
- 14
-
E. H. Gornish.
Compile time analysis for data prefetching.
Master's thesis, University of Illinois at Urbana-Champaign, December
1989.
- 15
-
A. Gupta, J. Hennessy, K. Gharachorloo, T. Mowry, and W-D. Weber.
Comparative evaluation of latency reducing and tolerating techniques.
In Proceedings of the 18th Annual International Symposium on
Computer Architecture, pages 254-263, May 1991.
- 16
-
A. C. Klaiber and H. M. Levy.
Architecture for software-controlled data prefetching.
In Proceedings of the 18th Annual International Symposium on
Computer Architecture, pages 43-63, May 1991.
- 17
-
D. Kroft.
Lockup-free instruction fetch/prefetch cache organization.
In Proceedings of the 8th Annual International Symposium on
Computer Architecture, pages 81-85, 1981.
- 18
-
M. S. Lam.
Software pipelining: An effective scheduling technique for vliw
machines.
In Proc. ACM SIGPLAN 88 Conference on Programming Language
Design and Implementation, pages 318-328, June 1988.
- 19
-
M. S. Lam, E. E. Rothberg, and M. E. Wolf.
The cache performance and optimizations of blocked algorithms.
In Proceedings of the Fourth International Conference on
Architectural Support for Programming Languages and Operating Systems, pages
63-74, April 1991.
- 20
-
R. L. Lee.
The Effectiveness of Caches and Data Prefetch Buffers in
Large-Scale Shared Memory Multiprocessors.
PhD thesis, Department of Computer Science, University of Illinois at
Urbana-Champaign, May 1987.
- 21
-
A. C. McKeller and E. G. Coffman.
The organization of matrices and matrix operations in a paged
multiprogramming environment.
CACM, 12(3):153-165, 1969.
- 22
-
T. Mowry and A. Gupta.
Tolerating latency through software-controlled prefetching in
shared-memory multiprocessors.
Journal of Parallel and Distributed Computing, 12(2):87-106,
1991.
- 23
-
A. K. Porterfield.
Software Methods for Improvement of Cache Performance on
Supercomputer Applications.
PhD thesis, Department of Computer Science, Rice University, May
1989.
- 24
-
B. R. Rau and C. D. Glaeser.
Some Scheduling Techniques and an Easily Schedulable Horizontal
Architecture for High Performance Scientific Computing.
In Proceedings of the 14th Annual Workshop on Microprogramming,
pages 183-198, October 1981.
- 25
-
J. P. Singh, W-D. Weber, and A. Gupta.
Splash: Stanford parallel applications for shared memory.
Technical Report CSL-TR-91-469, Stanford University, April 1991.
- 26
-
M. D. Smith.
Tracing with pixie.
Technical Report CSL-TR-91-497, Stanford University, November 1991.
- 27
-
SPEC.
The SPEC Benchmark Report.
Waterside Associates, Fremont, CA, January 1990.
- 28
-
S. W. K. Tjiang and J. L. Hennessy.
Sharlit: A tool for building optimizers.
In SIGPLAN Conference on Programming Language Design and
Implementation, 1992.
- 29
-
M. E. Wolf and M. S. Lam.
A data locality optimizing algorithm.
In Proceedings of the SIGPLAN '91 Conference on Programming
Language Design and Implementation, pages 30-44, June 1991.
Up: Design and Evaluation of
Previous: Acknowledgments