Abstract
The current response to fault tolerance relies upon either time or hardware redundancy in order to mask faults. Time redundancy implies a re-execution of the failed computation after the failure has been detected, although this can further be optimized by the use of checkpoints these solutions still impose a significant delay. In many mission critical systems hardware redundancy has traditionally deployed in the form of process replication to provide fault tolerance, avoiding delay and maintaining tight deadlines. Both approaches have drawbacks, re-execution requiring additional time and replication requiring additional resources, especially energy. This forces the systems engineer to choose between time or hardware redundancy, cloud computing environments have largely chosen replication because response time is often critical. In this paper we propose a new computational model called shadow computing, which provides goal-based adaptive resilience through the use of dynamic execution. Using this general model we develop shadow replication which enables a parameterized tradeoff between time and hardware redundancy to provide fault tolerance. Then we build an analytical model to predict the expected energy savings and provide an analysis using that model.
Original language | English |
---|---|
Pages | 73-77 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 2014 |
Externally published | Yes |
Event | 2014 International Conference on Computing, Networking and Communications, ICNC 2014 - Honolulu, HI, United States Duration: Feb 3 2014 → Feb 6 2014 |
Conference
Conference | 2014 International Conference on Computing, Networking and Communications, ICNC 2014 |
---|---|
Country/Territory | United States |
City | Honolulu, HI |
Period | 2/3/14 → 2/6/14 |
Keywords
- fault tolerance
- resiliency
- scheduling
- shadow computing
ASJC Scopus subject areas
- Computer Networks and Communications