Study of Undersampling Method: Instance Hardness Threshold with Various Estimators for Hate Speech Classification
Naufal Azmi Verdikha(1*), Teguh Bharata Adji(2), Adhistya Erna Permanasari(3)
(1) Universitas Gadjah Mada
(2) Universitas Gadjah Mada
(3) Universitas Gadjah Mada
(*) Corresponding Author
Abstract
Keywords
Full Text:
PDFReferences
(2017) “International Covenant on Civil and Political Rights.” [Online], http://www.ohchr.org/en/professionalinterest/pages/ccpr.aspx, Accessed date: 15-Nov-2017.
T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated Hate Speech Detection and the Problem of Offensive Language,” Proceedings of the 11th International AAAI Conference on Web and Social Media, vol. abs/1703.04009, 2017, pp. 512-515.
H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowl. Data Eng., Vol. 21, No. 9, pp. 1263–1284, 2009.
M. R. Smith, T. Martinez, and C. Giraud-Carrier, “An Instance Level Analysis of Data Complexity,” Mach. Learn., Vol. 95, No. 2, pp. 225–256, 2014.
V. Bijalwan, V. Kumar, P. Kumari, and J. Pascual, “KNN based Machine Learning Approach for Text and Document Mining,” Int. J. Database Theory Appl., Vol. 7, No. 1, pp. 61–70, Feb. 2014.
M. A. Shehab, O. Badarneh, M. Al-Ayyoub, and Y. Jararweh, “A Supervised Approach for Multi-Label Classification of Arabic News Articles,” Proc. - CSIT 2016 2016 7th Int. Conf. Comput. Sci. Inf. Technol., 2016, pp. 1–6.
U. Inyaem, P. Meesad, and C. Haruechaiyasak, “Named-entity Techniques for Terrorism Event Extraction and Classification,” 2009 8th Int. Symp. Nat. Lang. Process. SNLP ’09, 2009, pp. 175–179.
F. A. Wenando, T. B. Adji, and I. Ardiyanto, “Text Classification to Detect Student Level of Understanding in Prior Knowledge Activation Process,” Adv. Sci. Lett., Vol. 23, No. 3, pp. 2285–2287, Mar. 2017.
G. Lemaitre, “Computer-Aided Diagnosis for Prostate Cancer using Multi-Parametric Magnetic Resonance Imaging,” Doctoral Thesis, Universitat de Girona, Girona, Catalonia, Spain, Nov. 2016.
P. Fortuna, “Automatic Detection of Hate Speech in Text : An Overview of the Topic and Dataset Annotation with Hierarchical Classes,” Dissertation, Universidade do Porto, Porto, Portugal, Jun. 2017.
G. M. Weiss, “Foundations of Imbalanced Learning,” in Imbalanced Learning, Hoboken, NJ, USA: John Wiley & Sons, Inc., 2013, pp. 13–41.
G. M. Weiss and F. J. Provost, “Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction.,” J. Artif. Intell. Res., Vol. 19, pp. 315–354, 2003.
G. Lemaitre, F. Nogueira, and C. K. Aridas, “Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning,” CoRR, Vol. abs/1609.0, pp. 1–5, 2016.
Z. Waseem and D. Hovy, “Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter,” Proc. NAACL Student Res. Work, 2016, pp. 88–93.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., Vol. 12, pp. 2825–2830, 2011.
V. García, R. A. Mollineda, and J. S. Sánchez, “Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions,” in Pattern Recognition and Image Analysis: Proceedings of 4th Iberian Conference (IbPRIA 2009), 2009, pp. 441–448.
DOI: https://doi.org/10.22146/ijitee.42152
Article Metrics
Abstract views : 4378 | views : 2671Refbacks
- There are currently no refbacks.
Copyright (c) 2018 IJITEE (International Journal of Information Technology and Electrical Engineering)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
ISSN : 2550-0554 (online)
Contact :
Department of Electrical engineering and Information Technology, Faculty of Engineering
Universitas Gadjah Mada
Jl. Grafika No 2 Kampus UGM Yogyakarta
+62 (274) 552305
Email : ijitee.ft@ugm.ac.id
----------------------------------------------------------------------------