Wednesday, November 5, 2008

Software fault tolerance


Software fault tolerance is a new and emerging field, and their state of the art is primitive. Software faults are quick unlike hardware faults. However software never wares out, so that the faults are can be regarded instead as faults in design.
To provide reliability in the face of software faults, we must use redundancy. This means replicating the same software N times will not work; all N copies will fail for the same inputs. Single version software is already more expensive than the hardware in most large systems; demanding N versions of software for even small N can be very expensive.

There are two approaches to handling multiple versions. N version programming involves running all N versions in parallel and voting on the output. In contrast, the recovery–block approach involves running only one version at any one time. The output of this version is put throw an acceptance test, which checks to see if it is an acceptable range. If it is the output is passed as correct.

No comments: