Quanqing Yu, 1,2,3, Rui Xiong, 1, Ruixin Yang, 1, and Michael Pecht 3
1 National Engineering Laboratory for Electric Vehicles, School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China
2 School of Automotive Engineering, Harbin Institute of Technology, Weihai 264209, China
3 Center for Advanced Life Cycle Engineering, University of Maryland at College Park, College Park, MD 20742, USA
Abstract:
Data-driven fault diagnostics of industrial systems suffer from class-imbalanced problems, which is a common
challenge for machine learning algorithms as it is difficult to learn the features of the minority class samples. Synthetic oversampling
methods are commonly used to tackle these problems by generating minority class samples to balance the majority and minority classes.
Two major issues will influence the performance of oversampling methods which are how to choose the most appropriate existing minority seed samples,
and how to synthesize new samples from seed samples effectively. However, many existing oversampling methods are not accurate and effective enough to
generate new samples when dealing with high-dimensional faulty samples with different imbalanced ratios, since they do not take these two factors into consideration at the same time.
This article develops a novel adaptive oversampling technique: expectation maximization (EM)-based local-weighted minority oversampling technique for industrial fault diagnostics.
This method uses a local-weighted minority oversampling strategy to identify hard-to-learn informative minority fault samples and an EM-based imputation algorithm to
generate fault samples based on the distribution of minority samples. To validate the performance of the developed method, experiments were conducted on two real-world datasets.
The results show that the developed method can achieve better performances, in terms of F-measure, Matthews correlation coefficient (MCC), and Mean (average of F-measure and MCC) values,
on multiclass imbalanced fault diagnostics in different imbalance ratios than state-of-arts’ baseline sampling techniques.