Nowadays, a variety of applications, including automated factories, autonomous vehicles, and Cyber Physical Systems (CPS), are experiencing significant growth. Given the diverse range of challenges that must be addressed, such as real-time management and visualization of a factory's current state through a 3D digital twin, trajectory calculation within autonomous vehicles, visualizing Human Machine Interfaces (HMI), traffic management in smart cities equipped with cameras, IoT devices, and their associated features, a broad array of \emph{heterogeneous devices} with various \emph{hardware accelerators} are being utilized to solve these different problems. In such applications, \emph{power consumption} and \emph{task execution latency} are key aspects to consider, hence investigating approaches that mitigate power consumption while still fully utilizing the computational power provided by the devices becomes necessary. Modern devices use hardware processors that enable the acceleration of highly parallel and data hungry computational workloads; a widely known example of such parallel processor is the \emph{Graphic Process Unit} (GPU), a hardware peripheral traditionally used for graphics rendering but nowadays it is also used as a general purpose compute accelerator. This thesis addresses an analysis of the state of the art of techniques that can be employed to \emph{optimize power consumption} and task execution latency, as well as two types of latencies/interference that tasks can potentially experience: latencies arising from tasks that are concurrently scheduled on the same acceleration unit, i.e., on a partitioned GPU, and the second type under consideration is the latencies experienced by tasks running on embedded boards, specifically on GPU-embedded systems, with a high computational load on the CPU side. Methods are proposed to understand and derive predictive models for latencies in both of the two types of interference. Furthermore, this thesis concludes with a comparative study of two GPU memory management methodologies: explicit copies versus unified virtual memory.

Memory interference and performance prediction in GPU-accelerated heterogeneous systems / Masola, A.. - (2024).

Memory interference and performance prediction in GPU-accelerated heterogeneous systems

MASOLA, ALESSIO
2024-01-01

Abstract

Nowadays, a variety of applications, including automated factories, autonomous vehicles, and Cyber Physical Systems (CPS), are experiencing significant growth. Given the diverse range of challenges that must be addressed, such as real-time management and visualization of a factory's current state through a 3D digital twin, trajectory calculation within autonomous vehicles, visualizing Human Machine Interfaces (HMI), traffic management in smart cities equipped with cameras, IoT devices, and their associated features, a broad array of \emph{heterogeneous devices} with various \emph{hardware accelerators} are being utilized to solve these different problems. In such applications, \emph{power consumption} and \emph{task execution latency} are key aspects to consider, hence investigating approaches that mitigate power consumption while still fully utilizing the computational power provided by the devices becomes necessary. Modern devices use hardware processors that enable the acceleration of highly parallel and data hungry computational workloads; a widely known example of such parallel processor is the \emph{Graphic Process Unit} (GPU), a hardware peripheral traditionally used for graphics rendering but nowadays it is also used as a general purpose compute accelerator. This thesis addresses an analysis of the state of the art of techniques that can be employed to \emph{optimize power consumption} and task execution latency, as well as two types of latencies/interference that tasks can potentially experience: latencies arising from tasks that are concurrently scheduled on the same acceleration unit, i.e., on a partitioned GPU, and the second type under consideration is the latencies experienced by tasks running on embedded boards, specifically on GPU-embedded systems, with a high computational load on the CPU side. Methods are proposed to understand and derive predictive models for latencies in both of the two types of interference. Furthermore, this thesis concludes with a comparative study of two GPU memory management methodologies: explicit copies versus unified virtual memory.
2024
Matematica
Memory Interference
GPUs
GPU-Accelerated Heterogeneous Systems
Embedded Platforms
Interference
Performance
Machine Learning
SVR
RF
Neural Network
Principal Component Analysis
Spearman Ranking
Overview of optimizzation methods
GPGPU-Sim
Embedded
Discrete
CPU-GPU Interference
CUDA
UVM (Unified Virtual Memory)
Memory Management in Cuda
Memory Bandwidth
Simulations
Cache
Survey
Partitionable GPU
Memory Aware Performance Estimation
Xavier
Orin
RTX 2070
RTX 2080
Capodieci, Nicola
File in questo prodotto:
File Dimensione Formato  
Tesi_PHD_alessio_PDFA_REVIEWED_FINAL.pdf

accesso aperto

Licenza: Creative commons
Dimensione 3.03 MB
Formato Adobe PDF
3.03 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/1889/5650
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact