Dependable Systems and Networks, pp.175-184, 2007.

Enhanced Reliability Modeling of RAID Storage Systems

Jon G. Elerath
Network Appliance, Inc.

Michael Pecht
University of Maryland
College Park, MD 20742


A flexible model for estimating reliability of RAID storage systems is presented. This model corrects errors associated with the common assumption that system times to failure follow a homogeneous Poisson process. Separate generalized failure distributions are used to model catastrophic failures and usage dependent data corruptions for each hard drive. Catastrophic failure restoration is represented by a three-parameter Weibull, so the model can include a minimum time to restore as a function of data transfer rate and hard drive storage capacity. Data can be scrubbed as a background operation to eliminate corrupted data that, in the event of a simultaneous catastrophic failure, results in double disk failures. Field-based times to failure data and mathematic justification for a new model are presented. Model
results have been verified and predict between 2 to 1,500 times as many double disk failures as that estimated using the current mean time to data loss method.

Complete article is available to CALCE Consortium Members.


[Home Page] [Articles Page]
Copyright © 2008 by CALCE and the University of Maryland, All Rights Reserved