Zhenyu Wu 2, Wenfang Lin, 3, Binghao Fu, 2, Juchuan Guo, 2, Yang Ji, 3, and Michael Pecht 1
1 CALCE, Center for Advanced Life Cycle Engineering, Department of Mechanical Engineering, University of Maryland, College Park, Maryland 20740, USA
2 Engineering Research Center of Information Network, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876, China
3 Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876, China
Abstract:
Data-driven fault diagnostics of industrial systems suffer from class-imbalanced problems, which is a common
challenge for machine learning algorithms as it is difficult to learn the features of the minority class samples. Synthetic oversampling
methods are commonly used to tackle these problems by generating minority class samples to balance the majority and minority classes.
Two major issues will influence the performance of oversampling methods which are how to choose the most appropriate existing minority seed samples,
and how to synthesize new samples from seed samples effectively. However, many existing oversampling methods are not accurate and effective enough to
generate new samples when dealing with high-dimensional faulty samples with different imbalanced ratios, since they do not take these two factors into consideration at the same time.
This article develops a novel adaptive oversampling technique: expectation maximization (EM)-based local-weighted minority oversampling technique for industrial fault diagnostics.
This method uses a local-weighted minority oversampling strategy to identify hard-to-learn informative minority fault samples and an EM-based imputation algorithm to
generate fault samples based on the distribution of minority samples. To validate the performance of the developed method, experiments were conducted on two real-world datasets.
The results show that the developed method can achieve better performances, in terms of F-measure, Matthews correlation coefficient (MCC), and Mean (average of F-measure and MCC) values,
on multiclass imbalanced fault diagnostics in different imbalance ratios than state-of-arts’ baseline sampling techniques.