TY - JOUR
T1 - Closed form solution for response time of fault tolerant network of processors
AU - Bataineh, Sameer
AU - Al-Karaki, Jamal
N1 - Funding Information:
The research in this paper is supported by UAEU research council grant no. 19-7-11/01. The authors would also like to thank the reviewers of the paper for their valuable comments which helped to improve the manuscript.
PY - 2004/6
Y1 - 2004/6
N2 - Employing the queuing theory, closed form solutions for the response time of a fault tolerant network of processors system based on the primary site approach is obtained. Fault tolerance is achieved in the primary site approach by having the services replicated by the primary at many nodes. All the requests are sent to the primary which, periodically, checkpoints its status on the backup nodes. If the primary fails, one of the backups takes over as primary. Two repair mechanisms are considered to repair faulty nodes in the system: delayed repair and immediate repair. In addition to their closed form formats, the analytical results presented in this paper have several other advantages over those presented in the previous work. First, for immediate repair case, there is no need to solve a set of recursive equations. Secondly, the results reveal much of the characteristics of the system, We studied the effect of checkpointing rate on the system response time and we found a closed form solution for the optimum checkpointing rate, which minimizes the system response time.
AB - Employing the queuing theory, closed form solutions for the response time of a fault tolerant network of processors system based on the primary site approach is obtained. Fault tolerance is achieved in the primary site approach by having the services replicated by the primary at many nodes. All the requests are sent to the primary which, periodically, checkpoints its status on the backup nodes. If the primary fails, one of the backups takes over as primary. Two repair mechanisms are considered to repair faulty nodes in the system: delayed repair and immediate repair. In addition to their closed form formats, the analytical results presented in this paper have several other advantages over those presented in the previous work. First, for immediate repair case, there is no need to solve a set of recursive equations. Secondly, the results reveal much of the characteristics of the system, We studied the effect of checkpointing rate on the system response time and we found a closed form solution for the optimum checkpointing rate, which minimizes the system response time.
KW - Checkpointing
KW - Fault tolerance
KW - Faulty multiprocessor system
KW - Primary site approach
KW - Queuing theory
KW - Repairable system
KW - Response time
UR - http://www.scopus.com/inward/record.url?scp=4444237492&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=4444237492&partnerID=8YFLogxK
U2 - 10.1016/S0045-7906(04)00017-5
DO - 10.1016/S0045-7906(04)00017-5
M3 - Article
AN - SCOPUS:4444237492
SN - 0045-7906
VL - 30
SP - 291
EP - 308
JO - Computers and Electrical Engineering
JF - Computers and Electrical Engineering
IS - 4
ER -