You are here
A SOFTWARE IMPLEMENTED FAULT-TOLERANCE (SIFT) LAYER FOR RELIABLE COMPUTING ON MASSIVELY PARALLEL COMPUTERS
Phone: (409) 694-2152
THERE IS A COMMON MISCONCEPTION THAT MASSIVELY PARALLEL COMPUTERS ARE NATURALLY RELIABLE, FAULT-TOLERANT, AND RECONFIGURABLE BECAUSE OF THEIR ABUNDANCE OF RESOURCES. IN FACT, THE ONLY ATTRIBUTE GAINED BY MULTIPLE PROCESSORS IS A HIGHER TOTAL FAILURE RATE. HOWEVER, DESIGNERS ARE LARGELY IGNORING THIS ISSUE TO CONCENTRATE ON THE CONSTRUCTION OF SYSTEMS WITH PEAK PERFORMANCE BEYOND THE TERAFLOP BOUNDARY. THE RESULT IS THE DEVELOPMENT OF SEVERAL GENERATIONS OF UNRELIABLE MASSIVELY PARALLEL COMPUTERS THAT ARE UNSUITABLE FOR THE STRICT DEMANDS OF THE COMMERCIAL MARKETPLACE. RESEARCHERS ARE DEVELOPING THE SOFTWARE IMPLEMENTED FAULT-TOLERANCE (SIFT) LAYER FOR THE PURPOSE OF ENHANCING THE RELIABILITY AND AVAILABILITY OF APPLICATIONS RUNNING ON MASSIVELY PARALLEL COMPUTERS. THE SIFT LAYER HANDLES THE REQUISITE DETECTION, RECONFIGURATION, ROUTING, AND RECOVERY TECHNIQUES NECESSARY FOR PROTECTION AGAINST HARDWARE FAULTS, TRANSPARENT TO USER APPLICATIONS. THE MARKET POTENTIAL FOR THIS PRODUCT IS VERY PROMISING, BECAUSE WITH THE CURRENT COSTS OF MASSIVELY PARALLEL COMPUTERS IN THE TENS OF MILLIONS, EVEN A SMALL IMPROVEMENT IN RELIABILITY WILL MEAN CONSIDERABLE SAVINGS TO THE CONSUMER.
* Information listed above is at the time of submission. *