Detection Of Spam Comments On Instagram Using Complementary Naïve Bayes
Nur Azizul Haqimi(1*), Nur Rokhman(2), Sigit Priyanta(3)
(1) Master Program of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(2) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(3) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(*) Corresponding Author
Abstract
Instagram (IG) is a web-based and mobile social media application where users can share photos or videos with available features. Upload photos or videos with captions that contain an explanation of the photo or video that can reap spam comments. Comments on spam containing comments that are not relevant to the caption and photos. The problem that arises when identifying spam is non-spam comments are more dominant than spam comments so that it leads to the problem of the imbalanced dataset. A balanced dataset can influence the performance of a classification method. This is the focus of research related to the implementation of the CNB method in dealing with imbalance datasets for the detection of Instagram spam comments. The study used TF-IDF weighting with Support Vector Machine (SVM) as a comparison classification. Based on the test results with 2500 training data and 100 test data on the imbalanced dataset (25% spam and 75% non-spam), the CNB accuracy was 92%, precision 86% and f-measure 93%. Whereas SVM produces 87% accuracy, 79% precision, 88% f-measure. In conclusion, the CNB method is more suitable for detecting spam comments in cases of imbalanced datasets.
Keywords
Full Text:
PDFReferences
[1] D. Frommer, ‘Here’s How To Use Instagram’, Business Insider, 2010. [Online]. Available: http://www.businessinsider.com/instagram-2010-11?IR=T. [Accessed: 04-Apr-2019].
[2] A. Singh and S. Batra, ‘Ensemble based spam detection in social IoT using probabilistic data structures’, Futur. Gener. Comput. Syst., vol. 81, pp. 359–371, 2018.
[3] X. Zheng, Z. Zeng, Z. Chen, Y. Yu, and C. Rong, ‘Detecting Spammers on Social Networks’, Neurocomputing, vol. 159, no. 1, pp. 27–34, 2015.
[4] W. Zhang and H. Sun, ‘Instagram Spam Detection’, IEEE 22nd Pacific Rim Int. Symp. Dependable Comput., 2017.
[5] Y. Pristyanto, ‘Kombinasi Teknik Resampling untuk Menangani Ketidakseimbangan Kelas pada Klasifikasi di Kelas’, Universitas Gadjah Mada, 2017.
[6] G. Hu, T. Xi, F. Mohammed, and H. Miao, ‘Classification of wine quality with imbalanced data’, Proc. IEEE Int. Conf. Ind. Technol., vol. 2016–May, pp. 1712–1717, 2016.
[7] H. Al Najada and X. Zhu, ‘iSRD : Spam Review Detection with Imbalanced Data Distributions’, no. August 2014, 2015.
[8] A. R. Chrismanto and Y. Lukito, ‘Deteksi Komentar Spam Bahasa Indonesia pada Instagram Menggunakan Naive Bayes’, Ultimatics, vol. IX, no. June, p. 50, 2017.
[9] J. D. M. Rennie, L. Shih, J. Teevan, and D. R. Karger, ‘Tackling the Poor Assumptions of Naive Bayes Text Classifiers’, Proc. Twent. Int. Conf. Mach. Learn., vol. 20, no. 1973, pp. 616–623, 2003.
[10] A. A. Septiandri and O. Wibisobo, ‘Detecting Spam Comments on Indonesia’s Instagram Posts’, Int. Conf. Comput. Appl. Informatics 2016, 2017.
[11] A. Kia and S. Sensoy, ‘Classification of Earthquake-Induced Damage for R / C Slab Column Frames Classification of Earthquake-Induced Damage for R / C Slab Column Frames Using Multiclass SVM and Its Combination with MLP Neural Network’, no. July 2014, 2016.
DOI: https://doi.org/10.22146/ijccs.47046
Article Metrics
Abstract views : 6191 | views : 3904Refbacks
- There are currently no refbacks.
Copyright (c) 2019 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
View My Stats1