Dataset Splitting Techniques Comparison For Face Classification on CCTV Images
Ade Nurhopipah(1*), Uswatun Hasanah(2)
(1) Department of Informatics, Universitas Amikom Purwokerto
(2) Departement of Information Technology, Universitas Amikom Purwokerto
(*) Corresponding Author
Abstract
The performance of classification models in machine learning algorithms is influenced by many factors, one of which is dataset splitting method. To avoid overfitting, it is important to apply a suitable dataset splitting strategy. This study presents comparison of four dataset splitting techniques, namely Random Sub-sampling Validation (RSV), k-Fold Cross Validation (k-FCV), Bootstrap Validation (BV) and Moralis Lima Martin Validation (MLMV). This comparison is done in face classification on CCTV images using Convolutional Neural Network (CNN) algorithm and Support Vector Machine (SVM) algorithm. This study is also applied in two image datasets. The results of the comparison are reviewed by using model accuracy in training set, validation set and test set, also bias and variance of the model. The experiment shows that k-FCV technique has more stable performance and provide high accuracy on training set as well as good generalizations on validation set and test set. Meanwhile, data splitting using MLMV technique has lower performance than the other three techniques since it yields lower accuracy. This technique also shows higher bias and variance values and it builds overfitting models, especially when it is applied on validation set.
Keywords
Full Text:
PDFReferences
[1] X. Ying, “An Overview of Overfitting and its Solutions,” J. Phys. Conf. Ser., vol. 1168, no. 2, 2019.
[2] B. Genç and H. Tunç, “Optimal training and test sets design for machine learning,” Turkish J. Electr. Eng. Comput. Sci., vol. 27, no. 2, pp. 1534–1545, 2019.
[3] Y. Xu and R. Goodacre, “On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning,” J. Anal. Test., vol. 2, no. 3, pp. 249–262, 2018.
[4] Suyanto, Machine Learning Tingkat Dasar dan Lanjut. Bandung: Informatika, 2018.
[5] M. J. Lakshmi and S. N. Rao, “Effect of K-fold cross validation on Mri brain images using support vector machine algorithm,” Int. J. Recent Technol. Eng., vol. 7, no. 6, pp. 301–307, 2019.
[6] M. R. Murty, S. K. Raju, M. V. Rao, and S. C. Satapathy, “Support Vector Machine with K-fold Cross Validation Model for Software Fault Prediction,” Int. J. Pure Appl. Math., vol. 118, no. 20, pp. 321–334, 2018.
[7] R. C. Sharma, K. Hara, and H. Hirayama, “A Machine Learning and Cross-Validation Approach for the Discrimination of Vegetation Physiognomic Types Using Satellite Based Multispectral and Multitemporal Data,” Scientifica (Cairo)., vol. 2017, 2017.
[8] A. Vabalas, E. Gowen, E. Poliakoff, and A. J. Casson, “Machine learning algorithm validation with a limited sample size,” PLoS One, vol. 14, no. 11, pp. 1–20, 2019.
[9] H. B. Moss, D. S. Leslie, and P. Rayson, “Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models,” pp. 2978–2989, 2018.
[10] C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, “An Empirical Comparison of Model Validation Techniques for Defect Prediction Models,” IEEE Trans. Softw. Eng., vol. 43, no. 11, pp. 1091–1094, 2017.
[11] C. L. M. Morais, M. C. D. Santos, K. M. G. Lima, and F. L. Martin, “Improving data splitting for classification applications in spectrochemical analyses employing a random-mutation Kennard-Stone algorithm approach,” Bioinformatics, vol. 35, no. 24, pp. 5257–5263, 2019.
[12] C. A. Ramezan, T. A. Warner, and A. E. Maxwell, “Evaluation of sampling and cross-validation tuning strategies for regional-scale machine learning classification,” Remote Sens., vol. 11, no. 2, 2019.
[13] M. Schnaubelt, “A comparison of machine learning model validation schemes for non-stationary time series data,” FAU Discussion Papers in Economics, vol. 11. Friedrich-Alexander-Universität Erlangen-Nürnberg, Institute for Economics, Erlangen, Erlangen, 2019.
[14] A. Nurhopipah and A. Harjoko, “Motion Detection and Face Recognition For CCTV Surveillance System,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 12, no. 2, p. 107, 2018.
[15] Y. Wong, S. Chen, S. Mau, C. Sanderson, and B. C. Lovell, “Patch-based probabilistic image quality assessment for face selection and improved video-based face recognition,” IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work., pp. 74–81, 2011.
DOI: https://doi.org/10.22146/ijccs.58092
Article Metrics
Abstract views : 5550 | views : 4223Refbacks
- There are currently no refbacks.
Copyright (c) 2020 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
View My Stats1