Publications

[1] C. Coarfa, Y. Dotsenko, J. Mellor-Crummey, F. Cantonnet, T. El-Ghazawi, A. Mohanti, Y. Yao, and D. Chavarría-Miranda. An evaluation of global address space languages: Co-array Fortran and Unified Parallel C. InProceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 36-47. ACM, 2005. [ bib ]
[2] P. Husbands, C. Iancu, and K. Yelick. A performance analysis of the berkeley upc compiler. In Proceedings of the 17th annual international conference on Supercomputing, pages 63-73. ACM, 2003. [ bib ]
[3] D. Mallón, G. Taboada, C. Teijeiro, J. Touriño, B. Fraguela, A. Gómez, R. Doallo, and J. Mourino. Performance evaluation of MPI, UPC and OpenMP on multicore architectures. Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 174-184, 2009. [ bib ]
[4] J. Dinan, P. Balaji, E. Lusk, P. Sadayappan, R. Thakur, et al. Hybrid parallel programming with MPI and Unified Parallel C. In Proceedings of the 7th ACM international conference on Computing frontiers, pages 177-186. ACM, 2010. [ bib ]
[5] F. Cantonnet, Y. Yao, M. Zahran, and T. El-Ghazawi. Productivity analysis of the upc language. In Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, page 254. IEEE, 2004. [ bib ]
[6] K. Yelick, D. Bonachea, W.Y. Chen, P. Colella, K. Datta, J. Duell, S.L. Graham, P. Hargrove, P. Hilfinger, P. Husbands, et al. Productivity and performance using partitioned global address space languages. InInternational Conference on Symbolic and Algebraic Computation: Proceedings of the 2007 international workshop on Parallel symbolic computation, volume 27, pages 24-32, 2007. [ bib ]
[7] J. Savant and S. Seidel. MuPC: A run time system for Unified Parallel C. PhD thesis, Michigan Technological University, 2002. [ bib ]
[8] T. El-Ghazawi and S. Chauvin. UPC benchmarking issues. In Parallel Processing, International Conference on, 2001., pages 365-372. IEEE, 2001. [ bib ]
[9] T. El-Ghazawi and F. Cantonnet. Upc performance and potential: A npb experimental study. In Supercomputing, ACM/IEEE 2002 Conference, pages 17-17. IEEE, 2002. [ bib ]
[10] W.Y. Chen, C. Iancu, and K. Yelick. Communication optimizations for fine-grained UPC applications. In Parallel Architectures and Compilation Techniques, 2005. PACT 2005. 14th International Conference on, pages 267-278. IEEE, 2005. [ bib ]
[11] C. Barton, C.Ć. Casçaval, G. Almási, Y. Zheng, M. Farreras, S. Chatterje, and J.N. Amaral. Shared memory programming for large scale machines. In ACM SIGPLAN Notices, volume 41, pages 108-117. ACM, 2006. [ bib ]
[12] M. Kurhekar, P. Varma, R. Barik, et al. Compilation of Unified Parallel C-language programs, August 7 2007. US Patent 7,254,809. [ bib ]
[13] N.R. Adiga, G. Almási, G.S. Almasi, Y. Aridor, R. Barik, D. Beece, R. Bellofatto, G. Bhanot, R. Bickford, M. Blumrich, et al. An overview of the bluegene/l supercomputer. In Supercomputing, ACM/IEEE 2002 Conference, pages 60-60. IEEE, 2002. [ bib ]
[14] R. Nishtala, G. Almasi, and C. Cascaval. Performance without pain= productivity: data layout and collective communication in UPC. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pages 99-110. ACM, 2008. [ bib ]
[15] S. Olivier, J. Huan, J. Liu, J. Prins, J. Dinan, P. Sadayappan, and C.W. Tseng. Uts: An unbalanced tree search benchmark. Languages and Compilers for Parallel Computing, pages 235-250, 2007. [ bib ]
[16] W. Kuchera and C. Wallace. The UPC memory model: Problems and prospects. In Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, page 16. IEEE, 2004. [ bib ]
[17] T. El-Ghazawi, W.W. Carlson, and J.M. Draper. UPC language specification v1. 1.1, 2003. [ bib ]
[18] C.H. Crawford, P. Henning, M. Kistler, and C. Wright. Accelerating computing with the cell broadband engine processor. In Proceedings of the 5th conference on Computing frontiers, pages 3-12. ACM, 2008. [ bib ]
[19] J. Prins, J. Huan, B. Pugh, C. Tseng, and P. Sadayappan. UPC implementation of an unbalanced tree search benchmark. Univ. North Carolina at Chapel Hill, Tech. Rep. TR03-034, 2003. [ bib ]
[20] S. Olivier and J. Prins. Scalable dynamic load balancing using UPC. In Parallel Processing, 2008. ICPP'08. 37th International Conference on, pages 123-131. IEEE, 2008. [ bib ]
[21] C. Bell, W.Y. Chen, D. Bonachea, and K. Yelick. Evaluating support for global address space languages on the cray x1. In Proceedings of the 18th annual international conference on Supercomputing, pages 184-195. ACM, 2004. [ bib ]
[22] K. Berlin, J. Huan, M. Jacob, G. Kochhar, J. Prins, B. Pugh, P. Sadayappan, J. Spacco, and C.W. Tseng. Evaluating the impact of programming language features on the performance of parallel applications on cluster architectures. Languages and Compilers for Parallel Computing, pages 194-208, 2004. [ bib ]
[23] D. Bonachea and J. Duell. Problems with using mpi 1.1 and 2.0 as compilation targets for parallel language implementations. International Journal of High Performance Computing and Networking, 1(1):91-99, 2004. [ bib ]
[24] A. Johnson. Cfd on the cray x1e using unified parallel c. In a PowerPoint presentation, 5th UPC Workshop, 2005. [ bib ]
[25] Z. Zhang, J. Savant, and S. Seidel. A UPC runtime system based on MPI and POSIX threads. In Parallel, Distributed, and Network-Based Processing, 2006. PDP 2006. 14th Euromicro International Conference on, pages 8-pp. IEEE, 2006. [ bib ]
[26] F. Cantonnet, Y. Yao, S. Annareddy, A.S. Mohamed, and T.A. El-Ghazawi. Performance monitoring and evaluation of a UPC implementation on a NUMA architecture. In Parallel and Distributed Processing Symposium, 2003. Proceedings. International, pages 8-pp. IEEE, 2003. [ bib ]
[27] L. Chen, L. Liu, S. Tang, L. Huang, Z. Jing, S. Xu, D. Zhang, and B. Shou. Unified parallel C for GPU clusters: Language extensions and compiler implementation. Languages and Compilers for Parallel Computing, pages 151-165, 2011. [ bib ]
[28] A. Johnson. Unified Parallel C within computational fluid dynamics applications on the Cray X1 (E). In Proc. of the Cray User’s Group Conference. Albuquerque, pages 1-9, 2005. [ bib ]
[29] Z. Zhang and S. Seidel. Benchmark measurements of current upc platforms. In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International, pages 8-pp. IEEE, 2005. [ bib ]
[30] A. Salah, O. Serres, J. Gaber, R. Outbib, and H. El-Sayed. Simulation of the fuel cell thermal behavior with unified parallel c. In Signal Processing and Communications, 2007. ICSPC 2007. IEEE International Conference on, pages 149-152. IEEE, 2007. [ bib ]
[31] Y. Zheng, C. Iancu, P. Hargrove, S.J. Min, and K. Yelick. Extending Unified Parallel C for GPU computing. In SIAM Conf on Parallel Processing for Scientific Computing, 2010. [ bib ]
[32] C. Barton, C. Cascaval, G. Almasi, R. Garg, J. Amaral, and M. Farreras. Multidimensional blocking in UPC. Languages and Compilers for Parallel Computing, pages 47-62, 2008. [ bib ]
[33] C. Iancu, P. Husbands, and P. Hargrove. Hunting the overlap. In Parallel Architectures and Compilation Techniques, 2005. PACT 2005. 14th International Conference on, pages 279-290. IEEE, 2005. [ bib ]
[34] H.H. Su, M. Billingsley, and A.D. George. Parallel performance wizard: A performance analysis tool for partitioned global-address-space programming. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, pages 1-8. IEEE, 2008. [ bib ]
[35] H.H. Su, D. Bonachea, A. Leko, H. Sherburne, M. Billingsley, and A. George. Gasp! a standardized performance analysis tool interface for global address space programming models. Applied Parallel Computing. State of the Art in Scientific Computing, pages 450-459, 2007. [ bib ]
[36] D. Burke, J. Wawrzynek, K. Asanovic, A. Krasnov, A. Schultz, G. Gibeling, and P.Y. Droz. Ramp blue: Implementation of a manycore 1008 processor system. Proceedings of the Reconfigurable Systems Summer Institute, 2008. [ bib ]
[37] W.Y. Chen. Building a source-to-source UPC-to-C translator. Number UCB/CSD-4-1369. Computer Science Division, University of California, 2004. [ bib ]
[38] C. Bell, D. Bonachea, Y. Cote, J. Duell, P. Hargrove, P. Husbands, C. Iancu, M. Welcome, and K. Yelick. An evaluation of current high-performance networks. In Parallel and Distributed Processing Symposium, 2003. Proceedings. International, pages 10-pp. IEEE, 2003. [ bib ]
[39] V. Aggarwal, A. George, K. Yalamanchili, C. Yoon, H. Lam, and G. Stitt. Bridging parallel and reconfigurable computing with multilevel pgas and shmem+. In Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications, pages 47-54. ACM, 2009. [ bib ]
[40] Z. Ryne and S. Seidel. A specification of the extensions to the collective operations of unified parallel c. PhD thesis, Michigan Technological University, 2005. [ bib ]
[41] J. Coyle, I. Roy, M. Kraeva, and G.R. Luecke. Upc-check: a scalable tool for detecting run-time errors in unified parallel c. Computer Science-Research and Development, pages 1-7, 2012. [ bib ]
[42] G.R. Luecke, J. Coyle, J. Hoekstra, M. Kraeva, Y. Xu, E. Kleiman, and O. Weiss. Evaluating error detection capabilities of upc run-time systems. In Proceedings of the Third Conference on Partitioned Global Address Space Programing Models, page 7. ACM, 2009. [ bib ]
[43] J. Jose, M. Luo, S. Sur, and D.K. Panda. Unifying upc and mpi runtimes: experience with mvapich. In Fourth Conference on Partitioned Global Address Space Programming Model (PGAS), 2010. [ bib ]
[44] F. Blagojević, P. Hargrove, C. Iancu, and K. Yelick. Hybrid PGAS runtime support for multicore nodes. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, page 3. ACM, 2010. [ bib ]
[45] W.Y. Chen, D. Bonachea, C. Iancu, and K. Yelick. Automatic nonblocking communication for partitioned global address space programs. In Proceedings of the 21st annual international conference on Supercomputing, pages 158-167. ACM, 2007. [ bib ]
[46] J. Brown and Z. Wen. Toward an application support layer: Numerical computation in Unified Parallel C. Parallel Processing and Applied Mathematics, pages 912-919, 2006. [ bib ]
[47] S.J. Min, C. Iancu, and K. Yelick. Hierarchical work stealing on manycore clusters. In Fifth Conference on Partitioned Global Address Space Programming Models (PGAS11), 2011. [ bib ]
[48] C. Rasmussen, M. Sottile, J. Nieplocha, R. Numrich, and E. Jones. Co-array python: A parallel extension to the python language. In Euro-Par 2004 Parallel Processing, pages 632-637. Springer, 2004. [ bib ]
[49] T. El-Ghazawi, O. Serres, S. Bahra, M. Huang, and E. El-Araby. Parallel programming of high-performance reconfigurable computing systems with unified parallel c. Proc. of Reconfigurable Systems Summer Institute (July 7-9, 2008) RSSI, 2008. [ bib ]
[50] J. González-Domínguez, M. Martín, G. Taboada, J. Touriño, R. Doallo, and A. Gómez. A parallel numerical library for upc. Euro-Par 2009 Parallel Processing, pages 630-641, 2009. [ bib ]
[51] I. Patel and J.R. Gilbert. An empirical study of the performance and productivity of two parallel programming models. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, pages 1-7. IEEE, 2008. [ bib ]
[52] O. Serres, A. Anbar, S. Merchant, and T. El-Ghazawi. Experiences with UPC on TILE-64 processor. In Aerospace Conference, 2011 IEEE, pages 1-9. IEEE, 2011. [ bib ]
[53] E. Bethel. High performance, three-dimensional bilateral filtering. Technical report, Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (US), 2008. [ bib ]
[54] A. Marowka. Execution model of three parallel languages: OpenMP, UPC and CAF. Scientific Programming, 13(2):127-135, 2005. [ bib ]