This paper proposes a comparative study that investigates the effects of using resampling (undersampling and oversampling) methods with homogenous ensemble methods Bagging and AdaBoost in imbalanced data sets. We presented a hybrid ensemble approach that combined multi resampling by integrating both undersampling and oversampling to get benefits and reduces drawbacks caused by each of them. The proposed approach has improved the performance even those most sensitive to imbalanced class data sets.
FernandoA., BarrenecheaE., BusinessH., HerreraF. and GalarM., A Review on ensembles for the class Imbalance Problem, IEEE Transactions on Systems, Man and Cybernetics-Part C: Applications and Reviews42, 2012.
2.
KangP. and ChoS., EUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problems, in International Conference on Neural Information Processing, 2006.
3.
ZhangY. and WangD., A Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets, Abstract and Applied Analysis, 2013.
4.
KhoshgoftaarT.M., HulseJ.V., NapolitanoA. and SeiffertC., Building Useful Models from Imbalanced with Sampling and Boosting, in Proceedings of The Twenty-First International FLAIRS Coference, 2008, pp. 306-311.
5.
ChanK.L., FangW. and LiP., Hybrid Kernel Machine Ensemble for Imbalanced Data Sets, in 18th International Conference on Pattern Recognition, 2006, pp. 1108-1111.
6.
KhoshgoftaarT.M., HulseJ.V., NapolitanoA. and SeiffertC., RUSBoost: A Hybrid Approach to Alleviating Class Imbalance, IEEE Transactions on Systems, Man and Cybernetics-Part A: Systems and Humans40(1), 185-197.
7.
GovindarajM. and LavanyaS., A Combined Boosting And Sampling Approach For Imbalanced Data Classification, International Journal of Advanced Research in Data Mining and Cloud Computing1(1) (2013), 44-50.
8.
YuanB. and MaX., Sampling + Reweighting: Boosting the Performance of AdaBoost on Imbalanced Datasets, IEEE World Congress on Computational Intelligence, pp. 2680-2685, 2012.
9.
RenY., JiaP. and XiongH., A Novel Classification Approach for C2C E-Commerce Fraud Detection, International Journal of Digital Content Technology and its Applications7(1) (2013), 504-511.
10.
WuJ., ZhouZ. and LiuX., Exploratory Undersampling for Class-Imbalance Learning, IEEE Transactions On Systems, Man And Cybernetics - Part B 139(2) (2008), 500-539.
11.
TianyuL., Imbalance learning for fault diagnosis gearbox in wind turbine, Journal of Chemical and Pharmaceutical Research7(3) (2015), 1287-1292.
12.
KhoshgoftaarT.M., WaldR. and GaoK., The Use of Under- And Oversampling with in Ensemble Feature Selection and Classification for Software Quality, International Journal of Reliability, Quality and Safety Engineering21(1) (2014).
13.
BurezJ. and PoelD.V., Handling class imbalance in customer churn prediction, Experts System with Applications36 (2009), 4626-4636.
14.
GueH. and ViktorH., Learning from Imbalanced Data Sets with Boosting and Generation: The DataBoost-IM Approach, SIGKDD Explorations6, 30-39.
15.
ZhangY. and LuoB., Parallel Classification Ensemble with Hierarchical Machine Learning for Imbalanced Classes, in The seventh International conference on Machine Learning and Cybernetics, Kunming, 2008.
16.
ChakrabortyS., PopescuM.L. and KhaliliaM., Predicting disease risks from highly imbalanced data using random forest, BMC Medical Informatics and Decision Making11(51) (2011).
17.
XuQ., ZhouL. and WangH., Seminal Quality Prediction Using Clustering-Based Decision Forests, Algorithms7 (2014), 405-417.
18.
Wo'zniakM., SchaeferG. and KrawczykB., Cost Sensitive Decision Tree Ensembles for Effective imbalanced Classification, Applied Soft Computing, pp. 554-562.
VasuM. and RaviV., A hyprid under-sampling approach for mining unbalanced datasets: Applications to banking and insurance, Int J. Data Mining Modelling and Management3 (2011), 75-105.
21.
(2013) University of California at Irvine (UCI) repository. [Online]. https://archive.ics.uci.edu/ml/machine-learning-databases/.