Performance Diagnosis of Holistic HPC Workflows

HPC Workflow Performance encompasses the monitoring and analysis of performance problems that span across traditionally separated aspects of an HPC effort. We must represent and characterize the effort from a number of different perspectives or views.

Our specific focus is on Application-level diagnosis of interference. For example, an application may run slowly because an application run by another user is increasing the load on a shared file system. With current tools and methods this scenario is almost impossible to detect. Developers may devote large amounts of time to analyzing their own code, when in fact the performance bottleneck is elsewhere.

Scalable Tools Workshop Presentation August 2018

Scalable Tools Workshop presentation from summer 2016

Portions of this work were conducted at the Ultrascale Systems Research Center (USRC) supported by Los Alamos National Laboratory under Contract No. DE-AC52-06NA25396 with the U.S. Department of Energy. This work supported in part by the New Mexico Consortium.