TY - GEN
T1 - A systematic fault-tolerant computational model for both crash failures and silent data corruption
AU - Cui, Xiaolong
AU - Hussain, Zaeem
AU - Znati, Taieb
AU - Melhem, Rami
N1 - Funding Information:
This research is supported by the Department of Energy under contract DE-SC0014376.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/6/29
Y1 - 2018/6/29
N2 - As the boundaries between Cloud and HPC continue to blur, it is clear that there is an urgent demand for a systematic computational model that adapts to the computing platform and accommodates the underlying workloads. As computing systems continue to scale out to satisfy the increasingly large demands on computing capacity, power awareness and fault tolerance have become major concerns. This paper proposes a novel computational model that applies to both compute- A nd data-intensive workloads, and deals with diverse types of faults. Evaluation results demonstrate that the proposed model is able to achieve significant energy savings compared to existing fault tolerance techniques, while maintaining the same level of fault tolerance.
AB - As the boundaries between Cloud and HPC continue to blur, it is clear that there is an urgent demand for a systematic computational model that adapts to the computing platform and accommodates the underlying workloads. As computing systems continue to scale out to satisfy the increasingly large demands on computing capacity, power awareness and fault tolerance have become major concerns. This paper proposes a novel computational model that applies to both compute- A nd data-intensive workloads, and deals with diverse types of faults. Evaluation results demonstrate that the proposed model is able to achieve significant energy savings compared to existing fault tolerance techniques, while maintaining the same level of fault tolerance.
KW - Extreme-scale
KW - Fault tolerance
KW - Power awareness
KW - Shadow Computing
KW - Silent data corruption
UR - http://www.scopus.com/inward/record.url?scp=85050228551&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85050228551&partnerID=8YFLogxK
U2 - 10.1109/ICIN.2018.8401596
DO - 10.1109/ICIN.2018.8401596
M3 - Conference contribution
AN - SCOPUS:85050228551
T3 - 21st Conference on Innovation in Clouds, Internet and Networks, ICIN 2018
SP - 1
EP - 8
BT - 21st Conference on Innovation in Clouds, Internet and Networks, ICIN 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 21st International Conference on Innovation in Clouds, Internet and Networks, ICIN 2018
Y2 - 19 February 2018 through 22 February 2018
ER -