Energy and power consumption are prominent issues in today’s supercomputers and are foreseen as a limiting factor of future installations. In scientific computing, a significant amount of power is spent in the communication and synchronization-related idle times among distributed processes participating to the same application. However, due to the time scale at which communication happens, taking advantage of low-power states to reduce power in idle times in the computing resources, may introduce significant overheads. In this paper we present COUNTDOWN, a methodology and a tool for identifying and automatically reducing the frequency of the computing elements in order to save energy during communication and synchronization primitives. COUNTDOWN is able to filter out phases which would detriment the time to solution of the application transparently to the user, without touching the application code nor requiring recompilation of the application. We tested our methodology in a production Tier-0 system, a production application - Quantum ESPRESSO (QE) - with production datasets which can scale up to 3.5K cores. Experimental results show that our methodology saves 22.36% of energy consumption with a performance penalty of 2.88% in real production MPI-based application.
COUNTDOWN - A run-time library for application-agnostic energy saving in MPI communication primitives / Cesarini, D.; Bartolini, A.; Bonfà, Pietro; Cavazzoni, C.; Benini, L.. - In: IEEE TRANSACTIONS ON COMPUTERS. - ISSN 0018-9340. - /:(2018), pp. 1-6. (Intervento presentato al convegno 2nd Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems, ANDARE 2018 - A Workshop part of PACT 2018 Conference tenutosi a cyp nel 2018) [10.1145/3295816.3295818].
COUNTDOWN - A run-time library for application-agnostic energy saving in MPI communication primitives
Bonfà Pietro;
2018-01-01
Abstract
Energy and power consumption are prominent issues in today’s supercomputers and are foreseen as a limiting factor of future installations. In scientific computing, a significant amount of power is spent in the communication and synchronization-related idle times among distributed processes participating to the same application. However, due to the time scale at which communication happens, taking advantage of low-power states to reduce power in idle times in the computing resources, may introduce significant overheads. In this paper we present COUNTDOWN, a methodology and a tool for identifying and automatically reducing the frequency of the computing elements in order to save energy during communication and synchronization primitives. COUNTDOWN is able to filter out phases which would detriment the time to solution of the application transparently to the user, without touching the application code nor requiring recompilation of the application. We tested our methodology in a production Tier-0 system, a production application - Quantum ESPRESSO (QE) - with production datasets which can scale up to 3.5K cores. Experimental results show that our methodology saves 22.36% of energy consumption with a performance penalty of 2.88% in real production MPI-based application.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.