This talk will briefly describe the history of massively parallel computing at Sandia National Laboratories, from the early days of MPP systems to the current Cray Red Storm system, which contains more than 25 000 processing cores. I will describe how our experience has shaped several principles that drive the design, implementation, deployment, and use of large-scale massively parallel systems. Specifically, I will focus on the design of Sandia's lightweight compute node operating system and high-performance network programming interface. Both of these components play a critical role in achieving maximum performance and scalability on Red Storm. In addition, I will discuss several system software research projects that are exploring the impacts of multi-core processors, hardware virtualization, and the sensitivity of applications to operating system interference.
Ron Brightwell is a Principal Member of Technical Staff at Sandia National Laboratories in Albuquerque, New Mexico. Since joining Sandia in 1995, he has designed and developed high-performance networking software for several large-scale massively parallel computing platforms, including the Cray T3D and T3E, the Intel Paragon and TFLOPS, Sandia's Cplant Linux clusters, and the Cray Red Storm. He has also contributed to the research and development of several lightweight compute node operating systems, including Catamount on the Cray XT3/4.
Karen Karavanic