Closed form solution for response time of fault tolerant network of processors

Sameer Bataineh, Jamal Al-Karaki

Research output: Contribution to journalArticlepeer-review

Abstract

Employing the queuing theory, closed form solutions for the response time of a fault tolerant network of processors system based on the primary site approach is obtained. Fault tolerance is achieved in the primary site approach by having the services replicated by the primary at many nodes. All the requests are sent to the primary which, periodically, checkpoints its status on the backup nodes. If the primary fails, one of the backups takes over as primary. Two repair mechanisms are considered to repair faulty nodes in the system: delayed repair and immediate repair. In addition to their closed form formats, the analytical results presented in this paper have several other advantages over those presented in the previous work. First, for immediate repair case, there is no need to solve a set of recursive equations. Secondly, the results reveal much of the characteristics of the system, We studied the effect of checkpointing rate on the system response time and we found a closed form solution for the optimum checkpointing rate, which minimizes the system response time.

Original languageEnglish
Pages (from-to)291-308
Number of pages18
JournalComputers and Electrical Engineering
Volume30
Issue number4
DOIs
Publication statusPublished - Jun 2004

Keywords

  • Checkpointing
  • Fault tolerance
  • Faulty multiprocessor system
  • Primary site approach
  • Queuing theory
  • Repairable system
  • Response time

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computer Science(all)
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Closed form solution for response time of fault tolerant network of processors'. Together they form a unique fingerprint.

Cite this