Performance Analysis and Scalability of Parallel Applications
Prof Emilio Luque. Computer Architecture and Operating System Department (CAOS)
HPC for Efficient Applications & Simulation (HPC4EAS). University Autonoma of Barcelona (UAB), Spain
Abstract
Due to the complex interaction between the message-passing applications and the HPC system, many applications may suffer performance inefficiencies when they scale to a large number of processes. This problem is particularly serious when the application is executed many times over a long period of time.
Based on prediction models, such as PAS2P (Parallel Application Signature for Performance Prediction) (**), with the purpose of making an efficient use of the Parallel System resources, we propose the methodology P3S (Prediction of Parallel Program Scalability), which allows us to analyze and predict the strong scalability behavior for message-passing applications on a given system.
The methodology strives to use a bounded analysis time, and a reduced set of resources to predict the application performance and scalability.
The methodology is made up of three stages:
- Characterization step: from the execution of a set of small-scale application signatures, the relevant phases of the parallel application are characterized,
- Modeling the application: the scalable logical trace of the application is constructed from the model of the logical scalability, generated for each representative phase of the application. This trace will be used to predict the logical behavior of the application, as the number of the application processes increases.
- Predicting the application performance for a specific number of processes
The output of the P3S methodology will be the predicted curve of application speedup.
Based on this information and with the aim of use the system resources efficiently, the users can select the most appropriate resources to execute the application on the target system. We executed from 16 to 256 processors and predicted the computation time up until 4,096 processors. For the tested applications, we obtained an error of less than 9% for the application speedup.
(**) A. Wong, D. Rexachs, E. Luque: “Parallel Application Signature for Performance Analysis and Prediction”. IEEE Trans. Parallel Distrib. Syst. 26(7): 2009-2019 (2015)