Transfer Learning of Pre-trained Transformers for Covid-19 Hoax Detection in Indonesian Language

https://doi.org/10.22146/ijccs.66205

Lya Hulliyyatus Suadaa(1*), Ibnu Santoso(2), Amanda Tabitha Bulan Panjaitan(3)

(1) Politeknik Statistika STIS
(2) Politeknik Statistika STIS
(3) Politeknik Statistika STIS
(*) Corresponding Author

Abstract


Nowadays, internet has become the most popular source of news. However, the validity of the online news articles is difficult to assess, whether it is a fact or a hoax. Hoaxes related to Covid-19 brought a problematic effect to human life. An accurate hoax detection system is important to filter abundant information on the internet.  In this research, a Covid-19 hoax detection system was proposed by transfer learning of pre-trained transformer models. Fine-tuned original pre-trained BERT, multilingual pre-trained mBERT, and monolingual pre-trained IndoBERT were used to solve the classification task in the hoax detection system. Based on the experimental results, fine-tuned IndoBERT models trained on monolingual Indonesian corpus outperform fine-tuned original and multilingual BERT with uncased versions. However, the fine-tuned mBERT cased model trained on a larger corpus achieved the best performance.

Keywords


hoax detection, transfer learning, pre-trained transformer, Indonesian language text processing

Full Text:

PDF


References

A. T. B. Panjaitan and I. Santoso, “Deteksi Hoaks Pada Berita Berbahasa Indonesia Seputar COVID-19,” Jurnal FORMAT (Teknik Informatika)., vol. 10, no. 1, p. 76, 2021 [Online]. Available: https://publikasi.mercubuana.ac.id/index.php/format/article/view/10978. [Accessed: 26-May-2021] [2] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), p. 4171–4186, 2019 [Online]. Available: http https://www.aclweb.org/anthology/N19-1423/. [Accessed: 26-May-2021] [3] S. Wu and M. Dredze, “Beto, Bentz, Becas: The surprising cross-lingual effectiveness of BERT,” Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pp. 833-844, 2019 [Online]. Available: https://www.aclweb.org/anthology/D19-1077/. [Accessed: 26-May-2021] [4] B. Wilie, K. Vincentio, G. I. Winata, S. Cahyawijaya, X. Li, Z. Y. Lim, S. Soleman, R. Mahendra, P. Fung, S. Bahar, and A. Purwarianti, “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pp. 843-857, 2020 [Online]. Available: https://www.aclweb.org/anthology/2020.aacl-main.85/. [Accessed: 26-May-2021] [5] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” Proceedings of the 28th International Conference on Computational Linguistics, pp. 757-770, 2020 [Online]. Available: https://www.aclweb.org/anthology/2020.coling-main.66/. [Accessed: 26-May-2021] [6] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, and A. Rush. “Transformers: State-of-the-Art Natural Language Processing,” Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38-45 [Online]. Available: https://www.aclweb.org/anthology/2020.emnlp-demos.6/ [Accessed: 26-May-2021] [7] A. N. Azhar and M. L. Khodra, “Fine-tuning Pretrained Multilingual BERT Model for Indonesian Aspect-based Sentiment Analysis,” Proceedings of the 7th International Conference on Advance Informatics: Concepts, Theory and Applications (ICAICTA 2020), 2020 [Online]. Available: https://ieeexplore.ieee.org/document/9428882. [Accessed: 26-May-2021] [8] Ilham Firdausi Putra; Ayu Purwarianti, “Improving Indonesian Text Classification Using Multilingual Language Model,” Proceedings of the 7th International Conference on Advance Informatics: Concepts, Theory and Applications (ICAICTA 2020), 2020 [Online]. Available: https://ieeexplore.ieee.org/document/9429038. [Accessed: 26-May-2021] [9] R. Wijayanti; M. L. Khodra, and D. H. Widyantoro, “Indonesian Abstractive Summarization using Pre-trained Model,” Proceedings of the 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT 2021), 2021 [Online]. Available: https://ieeexplore.ieee.org/document/9431880. [Accessed: 26-May-2021] [10] E. Zuliarso, M. T. Anwar, K. Hadiono and I. Chasanah, “Detecting Hoaxes in Indonesian News Using TF/TDM and K Nearest Neighbor,” IOP Conference Series: Materials Science and Engineering, Vol. 835, 2019 [Online]. Available:, https://iopscience.iop.org/article/10.1088/1757-899X/835/1/012036. [Accessed: 26-May-2021] [11] I. Y. R. Pratiwi, R. A. Asmara, and F. Rahutomo, “Study of Hoax News Detection using Naïve Bayes Classifier in Indonesian Language,” Proceedings of the 11th International Conference on Information & Communication Technology and System (ICTS), 2017 [Online]. Available: https://ieeexplore.ieee.org/document/8265649. [Accessed: 26-May-2021] [12] B. P. Nayoga, R. Adipradana, R. Suryadia, and D. Suhartono, “Hoax Analyzer for Indonesian News Using Deep Learning Models,” Procedia Computer Science: Special Issues of the 5th International Conference on Computer Science and Computational Intelligence, 2020 [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1877050921000739. [Accessed: 26-May-2021] [13] Cambridge Dictionary, “Definition of Hoax”, 2021 [Online]. Available: https://dictionary.cambridge.org/dictionary/english/hoax. [Accessed: 26-May-2021] [14] Collins Dictionary, “Definition of Hoax”, 2021 [Online]. https://www.collinsdictionary.com/dictionary/english/hoax. [Accessed: 26-May-2021] [15] Merriam Webster, “Definition of Hoax”, 2021 [Online]. https://www.merriam-webster.com/dictionary/hoax. [Accessed: 26-May-2021] [16] Kominfo, “There are 800,000 Hoax Spreader Sites in Indonesia,” 12-Dec-2017 [Online], https://kominfo.go.id/content/detail/12008/ada-800000-situs-penyebar-hoax-di-indonesia/0/highlight_media. [Accessed: 26-May-2021] [17] Forbes, “Report: More Than 800 Deaths And 5,800 Hospitalizations Globally May Have Resulted From COVID-19 Misinformation,” 23-August-2020 [Online]. https://www.forbes.com/sites/markhall/2020/08/23/coronavirus-misinformation/. [Accessed: 26-May-2021]



DOI: https://doi.org/10.22146/ijccs.66205

Article Metrics

Abstract views : 5302 | views : 5159

Refbacks

  • There are currently no refbacks.




Copyright (c) 2021 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.



Copyright of :
IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
ISSN 1978-1520 (print); ISSN 2460-7258 (online)
is a scientific journal the results of Computing
and Cybernetics Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133
email:ijccs.mipa@ugm.ac.id | http://jurnal.ugm.ac.id/ijccs



View My Stats1
View My Stats2