Scalable Event Tracing for High End Systems

We have developed a new performance measurement technique that is a hybrid between profiling and tracing, and a new, scalable presentation of event-based performance data that will facilitate quick understanding of the behavior of the parallel program. Our technique reduces the amount of data measured and stored by capitalizing on the repeated behavior of programs, and on the similarity of the behavior and performance of parallel processes in an application run. We retain enough information to recreate a complete (although approximate) trace of the parallel run. In addition, because the data has already been classified into behavior categories, we can present reduced, partially analyzed performance data to the user, highlighting the performance behaviors that comprised most of the execution time.