A general design for a scalable MPI-GPU multi-resolution 2D numerical solver

IRIS

This paper presents a multi-GPU implementation of a Finite-Volume solver on a multi-resolution grid. The implementation completely offloads the computation to the GPUs and communications between different GPUs are implemented by means of the Message Passing Interface (MPI) API. Different domain decomposition techniques have been considered and the one based on the Hilbert Space Filling Curves (HSFC) showed optimal scalability. Several optimizations are introduced: One-to-one MPI communications among MPI ranks are completely masked by GPU computations on internal cells and a novel dynamic load balancing algorithm is introduced to minimize the waiting times at global MPI synchronization barriers. Such algorithm adapts the computational load of ranks in response to dynamical changes in the execution time of blocks and in network performances; Its capability to converge to a balanced computation has been empirically shown by numerical experiments. Tests exploit up to 64 GPUs and 83M cells and achieve an efficiency of 90% in weak scalability and 85% for strong scalability. The framework is general and the results of the paper can be ported to a wide range of explicit 2D Partial Differential Equations solvers.

A general design for a scalable MPI-GPU multi-resolution 2D numerical solver / Turchetto, Massimiliano; Dal Palu, Alessandro; Vacondio, Renato. - In: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS. - ISSN 1045-9219. - 31:5(2020), pp. 8941298.1036-8941298.1047. [10.1109/TPDS.2019.2961909]

A general design for a scalable MPI-GPU multi-resolution 2D numerical solver

TURCHETTO, MASSIMILIANO;Dal Palu, Alessandro;Vacondio, Renato

2020-01-01

Abstract

This paper presents a multi-GPU implementation of a Finite-Volume solver on a multi-resolution grid. The implementation completely offloads the computation to the GPUs and communications between different GPUs are implemented by means of the Message Passing Interface (MPI) API. Different domain decomposition techniques have been considered and the one based on the Hilbert Space Filling Curves (HSFC) showed optimal scalability. Several optimizations are introduced: One-to-one MPI communications among MPI ranks are completely masked by GPU computations on internal cells and a novel dynamic load balancing algorithm is introduced to minimize the waiting times at global MPI synchronization barriers. Such algorithm adapts the computational load of ranks in response to dynamical changes in the execution time of blocks and in network performances; Its capability to converge to a balanced computation has been empirically shown by numerical experiments. Tests exploit up to 64 GPUs and 83M cells and achieve an efficiency of 90% in weak scalability and 85% for strong scalability. The framework is general and the results of the paper can be ported to a wide range of explicit 2D Partial Differential Equations solvers.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2020
			
	Citazione
	
				A general design for a scalable MPI-GPU multi-resolution 2D numerical solver / Turchetto, Massimiliano; Dal Palu, Alessandro; Vacondio, Renato. - In: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS. - ISSN 1045-9219. - 31:5(2020), pp. 8941298.1036-8941298.1047. [10.1109/TPDS.2019.2961909]
			
	Appare nelle tipologie:
	
				1.1 Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
A_General_Design_for_a_Scalable_MPI-GPU_Multi-Resolution_2D_Numerical_Solver.pdf solo utenti autorizzati Tipologia: Versione (PDF) editoriale Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 3.12 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	3.12 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/2869671

Citazioni

ND

10

8

social impact