Extremely high operational availability of information systems has been historically difficult to achieve because of the lack of understanding of the actual application environments and operating conditions, and their effects on system degradation and failure. Furthermore, the fact that most failures in information systems are intermittent makes many predictive methods unacceptable. Combining this complexity with the 40–85% no-fault-found (NFF) failure rate seen in system failure analysis suggests that current reliability practices need improvement. In particular, traditional approaches to failure mitigation have failed because of the reliance on averaged accumulated historical field data (e.g., Mil-Hdbk-217) Telcordia SR-332 (formerly Bellcore), and CNET/RDF (FIDES) (European)), rather than relying on in situ data from a particular system. In fact, studies have reported that these methods are inaccurate and misleading (i.e., they provide inconsistent results for any given system subject to given conditions). This is a major reason why the US Army has abandoned these approaches. In addition, the IEEE notes that information system failures are in some sense inevitable, because the current methods of assessing information systems have fundamental flaws.)
Complete article is available to CALCE Consortium Members.