Abstract
Achieving resilience in extreme-scale environments, while minimizing energy consumption, is a daunting challenge. At extreme scale, however, the classic checkpoint-restart approach or replication for recovery techniques become inadequate. In this paper, we propose a novel application-aware elastic resilience model, dShadowing, for extreme-scale environments, as an efficient and scalable alternative to checkpointing, pure replication and re-execution. The basic tenet of this model is a dShadow, which is a derivative of its associated main process, whose functional and non-functional attributes are derived to achieve high tolerance to failure, at a minimum energy cost, while closely adhering to QoS requirements. Contrary to current schemes, dShadowing assumes heterogeneous environments, where cores fail independently, but non-identically. The experiment's results show that dShadowing model can achieve on average over 20% reduction in energy consumption and expected completion time, in comparison to a baseline shadowing model that considers cores fail uniformly. The results also demonstrate the flexibility of the dShadowing model and the ability to tolerate failure at scale adaptively and efficiently.
| Original language | English |
|---|---|
| Title of host publication | 2021 IEEE International Performance, Computing, and Communications Conference, IPCCC 2021 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9781665443319 |
| DOIs | |
| Publication status | Published - 2021 |
| Externally published | Yes |
| Event | 2021 IEEE International Performance, Computing, and Communications Conference, IPCCC 2021 - Austin, United States Duration: Oct 29 2021 → Oct 31 2021 |
Publication series
| Name | Conference Proceedings of the IEEE International Performance, Computing, and Communications Conference |
|---|---|
| Volume | 2021-October |
| ISSN (Print) | 1097-2641 |
Conference
| Conference | 2021 IEEE International Performance, Computing, and Communications Conference, IPCCC 2021 |
|---|---|
| Country/Territory | United States |
| City | Austin |
| Period | 10/29/21 → 10/31/21 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
Keywords
- application-aware
- dShadowing
- extreme-scale
- heterogeneous environment
- resilience
ASJC Scopus subject areas
- General Engineering
Fingerprint
Dive into the research topics of 'Differential Shadowing: A Resilience Framework for Extreme-scale, Heterogeneous Environments with Non-Uniform Node Failure Distribution'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS