PARMA: Parallelization-Aware Run-time Management for Energy-Efficient Many-Core Systems

Mohammed A. N. Al-hayanni, Ashur Rafiev, Fei Xia, Rishad Shafik, Alexander Romanovsky, Alex Yakovlev
Electrical and Electronic Engineering, School of Engineering - Newcastle University, UK

Abstract

Performance and energy efficiency considerations have shifted computing paradigms from single-core to many-core architectures. At the same time, traditional speedup models such as Amdahl’s Law face challenges in the run-time reasoning for system performance and energy efficiency, because these models typically assume limited variations of the parallel fraction. Moreover, the parallel fraction, which varies dynamically in workloads, is generally unknown at run-time without application-level instrumentation.

This paper describers novel performance/energy trade-off models based on realistic architectural considerations, which describe the parallel fraction and speedup as functions of performance counter values available in modern processors, removing the need for application-level instrumentation. These are then used to develop a Parallelization-Aware Run-time Management (PARMA) approach.

PARMA aims at controlling core allocations and operating voltage/frequency points for energy efficiency, according to the varying workload parallel fractions. The efficacy of our models and the PARMA approach is extensively validated using a number of PARSEC benchmark applications, involving two performance/energy trade-off metrics: energy-delay-product (EDP), typically used in high-performance applications and energy per instruction (EPI), suitable for energy-aware applications. Up to 48 and 68 per-cent improvements in EDP and EPI have been observed using the PARMA approach compared with parallelization-agnostic methods.

Index Terms—run-time management; many-core; speedup; power modelling; energy-delay-product; energy per instruction.

The entire results from experiments supporting this work can be found in [Results.xlsx]. Some of the data has been plotted resulting in figures found in [additional-results.docx].

This work has been submitted for publication in IEEE Transactions for Computers.

Last modified 16/12/2019 by IGC