next up previous
Next: About this document Up: Data and Computation Previous: Acknowledgements

References

1
A. Agarwal, D. Chaiken, G. D'Souza, K. Johnson, and D. Kranz et. al. The MIT Alewife machine: A large-scale distributed memory multiprocessor. In Scalable Shared Memory Multiprocessors. Kluwer Academic Publishers, 1991.

2
A. Agarwal, D. Kranz, and V. Natarajan. Automatic paritioning of parallel loops for cache-coherent multiprocessors. In Proceedings of the 1993 International Conference on Parallel Processing, St. Charles, IL, August 1993.

3
A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, MA, second edition, 1986.

4
J. M. Anderson and M. S. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 112--125, Albuquerque, NM, June 1993.

5
B. Appelbe and B. Lakshmanan. Optimizing parallel programs using affinity regions. In Proceedings of the 1993 International Conference on Parallel Processing, pages 246--249, St. Charles, IL, August 1993.

6
U. Banerjee, R. Eigenmann, A. Nicolau, and D. Padua. Automatic program parallelization. Proceedings of the IEEE, 81(2):211--243, February 1993.

7
B. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0-1 integer programming. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 111--122, Montreal, Canada, August 1994.

8
W. J. Bolosky and M. L. Scott. False sharing and its effect on shared memory performance. In Proceedings of the USENIX Symposium on Experiences with Distributed and Multiprocessor Systems (SEDMS IV), pages 57--71, San Diego, CA, September 1993.

9
S. Carr, K. S. M.2emcKinley, and C.-W. Tseng. Compiler optimizations for improving data locality. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), pages 252--262, San Jose, CA, October 1994.

10
M. Cierniak and W. Li. Unifying data and control transformations for distributed shared memory machines. Technical Report TR-542, Department of Computer Science, University of Rochester, November 1994.

11
S. J. Eggers and T. E. Jeremiassen. Eliminating false sharing. In Proceedings of the 1991 International Conference on Parallel Processing, pages 377--381, St. Charles, IL, August 1991.

12
S. J. Eggers and R. H. Katz. The effect of sharing on the cache and bus performance of parallel programs. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-III), pages 257--270, Boston, MA, April 1989.

13
D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing, 5(5):587--616, October 1988.

14
R. L. Graham, D. E. Knuth, and O. Patashnik. Concrete Mathematics. Addison-Wesley, Reading, MA, 1989.

15
M. Gupta and P. Banerjee. Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers. IEEE Transactions on Parallel and Distributed Systems, 3(2):179--193, March 1992.

16
J. L. Hennessy and D. A. Patterson. Computer Architecture A Quantitative Approach. Morgan Kaufmann Publishers, San Mateo, CA, 1990.

17
High Performance Fortran Forum. High Performance Fortran language specification. Scientific Programming, 2(1-2):1--170, 1993.

18
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the ACM, 35(8):66--80, August 1992.

19
T. E. Jeremiassen and S. J. Eggers. Reducing false sharing on shared memory multiprocessors through compile time data transformations. Technical Report UW-CSE-94-09-05, Department of Computer Science and Engineering, University of Washington, September 1994.

20
Y. Ju and H. Dietz. Reduction of cache coherence overhead by compiler data layout and loop transformation. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, Fourth International Workshop, pages 344--358, Santa Clara, CA, August 1991. Springer-Verlag.

21
Kendall Square Research, Waltham, MA. KSR1 Principles of Operation, revision 6.0 edition, October 1992.

22
Kuck & Associates, Inc. KAP User's Guide. Champaign, IL 61820, 1988.

23
M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), pages 63--74, Santa Clara, CA, April 1991.

24
D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy. The DASH prototype: Implementation and performance. In Proceedings of the 19th International Symposium on Computer Architecture, pages 92--105, Gold Coast, Australia, May 1992.

25
J. Li and M. Chen. The data alignment phase in compiling programs for distributed-memory machines. Journal of Parallel and Distributed Computing, 13(2):213--221, October 1991.

26
T. J. Sheffler, R. Schreiber, J. R. Gilbert, and S. Chatterjee. Aligning parallel arrays to reduce communication. In Frontiers '95: The 5th Symposium on the Frontiers of Massively Parallel Computation, pages 324--331, McLean, VA, February 1995.

27
J. P. Singh, W.-D. Weber, and A. Gupta. SPLASH: Stanford parallel applications for shared-memory. Computer Architecture News, 20(1):5--44, March 1992.

28
J.P. Singh, T. Joe, A. Gupta, and J. L. Hennessy. An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors. In Proceedings of Supercomputing '93, pages 214--225, Portland, OR, November 1993.

29
O. Temam, E. D. Granston, and W. Jalby. To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In Proceedings of Supercomputing '93, pages 410--419, Portland, OR, November 1993.

30
J. Torrellas, M. S. Lam, and J. L. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. In Proceedings of the 1990 International Conference on Parallel Processing, pages 266--270, St. Charles, IL, August 1990.

31
E. Torrie, C-W. Tseng, M. Martonosi, and M. W. Hall. Evaluating the impact of advanced memory systems on compiler-parallelized codes. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), June 1995.

32
C-W. Tseng. Compiler optimizations for eliminating barrier synchronization. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, July 1995.

33
R. P. Wilson, R. S. French, C. S. Wilson, S. P. Amarasinghe, J. M. Anderson, S. W. K. Tjiang, S.-W. Liao, C.-W. Tseng, M. W. Hall, M. S. Lam, and J. L. Hennessy. SUIF: An infrastructure for research on parallelizing and optimizing compilers. ACM SIGPLAN Notices, 29(12):31--37, December 1994.

34
M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 30--44, Toronto, Canada, June 1991.

35
M. E. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems, 2(4):452--471, October 1991.


Saman Amarasinghe
Fri Apr 7 11:22:17 PDT 1995