Topic Modeling on Online News.Portal Using Latent Dirichlet Allocation (LDA)

https://doi.org/10.22146/ijccs.74383

Mohammad Rezza Fahlevvi(1*), Azhari SN(2)

(1) Master Program of Computer Science, FMIPA UGM, Yogyakarta
(2) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(*) Corresponding Author

Abstract


The amount of News displayed on online news portals. Often does not indicate the topic being discussed, but the News can be read and analyzed. You can find the main issues and trends in the News being discussed. It would be best if you had a quick and efficient way to find trending topics in the News. One of the methods that can be used to solve this problem is topic modeling.

Theme modeling is necessary to allow users to easily and quickly understand modern themes' development. One of the algorithms in topic modeling is the Latent Dirichlet Allocation (LDA). This research stage begins with data collection, preprocessing, n-gram formation, dictionary representation, weighting, topic model validation, topic model formation, and topic modeling results.

            Based on the results of the topic evaluation, the. The best value of topic modeling using coherence was related to the number of passes. The number of topics produced 20 keys, five cases with a 0.53 coherence value. It can be said to be relatively stable based on the standard coherence value.

Keywords


News Portal; Topic Modelling; Latent Dirichlet Allocation; Coherence Value

Full Text:

PDF


References

[1] Nair G. 2016. Text mining 101: Topic modeling Aug. 2019 [Online]. Available: http://www.kdnuggets.com/2016/07/text-mining-101-topic- modeling.html. [Accessed: 16-Feb-2019]

[2] Korzycki, M., Gatkowska, I., Lubaszewski, W., 2017. 2 - Can the Human Association Norm Evaluate Machine-Made Association Lists?, in Sharp, B., Sèdes, F., Lubaszewski, W. (Eds.), Cognitive Approaches to Natural Language Processing. Elsevier, pp. 21–40. https://doi.org/10.1016/B978-1-78548-253-3.50002-0.

[3] C. Zou, “Analyzing research trends on drug safety using topic modeling,” Expert Opin. Drug Saf., vol. 17, no. 6, pp. 629–636, 2018.

[4] K. B. Putra and R. P. Kusumawardani, “Analisis Topik Informasi Publik Media Sosial di Surabaya Menggunakan Pemodelan Latent Dirichlet Allocation (LDA),” J. Tek. ITS, vol. 6, no. 2, pp. 4–9, 2017.

[5] I.Komputer, D. Ilmu, F. Matematik, P. Alam, and U. G. Mada, “Document Clustering Dengan Latent Dirichlet Allocation Dan Ward,” vol. V, no. September, 2018.

[6] I. N. Kabiru, P. K. Sari, S. Prodi, and M. Bisnis, “Analisa Konten Media Sosial E-Commerce Pada Instagram Menggunakan Metode Sentimen Analysis Dan Lda-Based Topic Modeling (Studi Kasus : Shopee Indonesia ) Analysis Of Content Social Media E-Commerce In Instagram Using Sentiment Analysis And Lda Based Topki,” vol. 6, no. 1, pp. 12–19, 2019.

[7] Krasnashchok, K., Jouili, S., 2018. Improving Topic Quality by Promoting Named Entities in Topic Modeling. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics 247–253.

[8] Utami, K.,P., 2017. Analisis Topik Data Media Sosial Twitter Menggunakan Model Topik Latent Dirichlet Allocation, Skripsi, Program Studi S1 Ilmu Komputer, Fakultas Matematika dan Ilmu Pengetahuan Alam, Institut Pertanian Bogor, Bogor.

[9] Bhatia, S., Lau, J.H., Baldwin, T., 2017. An Automatic Approach for Document-level Topic Model Evaluation. Conference on Computational Natural Language Learning 206–215.

[10] Chandrasekar, P., Qian, K., 2016. The Impact of Data Preprocessing on the Performance of a Naive Bayes Classifier, in 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC). pp. 618–619. https://doi.org/10.1109/COMPSAC.2016.205.

[11] Hong, V.N., Nguyen, H., Hieu, D.N., Snasel, V., 2016. n -Gram-Based Text Compression. Computational Intelligence and Neuroscience. https://doi.org/10.1155/2016/9483646

[12] Sun, S., Dai, Z., Xi, X., Shan, X., Wang, B., 2018. Ensemble Machine Learning Identification of Power Fault Countermeasure Text Considering Word String TF-IDF Feature, in 2018 IEEE International Conference of Safety Produce Informatization (IICSPI). pp. 610–616. https://doi.org/10.1109/IICPSPI.2018.8690443.

[13] Agustina, A. 2017. Analisis dan visualisasi suara pelanggan pada pusat layanan pelanggan dengan pemodelan topik menggunakan latent dirichlet allocation (LDA) studi kasus: PT. Petrokimia Gresik [skripsi]. Surabaya(ID): Institut Teknologi Sepuluh Nopember.

[14] Korencic, D., Ristov, S., Snajder, J., 2018. Document-based Topic Coherence Measures for News Media Text. Preprint submitted to Expert Systems with Applications 1–44.



DOI: https://doi.org/10.22146/ijccs.74383

Article Metrics

Abstract views : 3037 | views : 3086

Refbacks

  • There are currently no refbacks.




Copyright (c) 2022 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.



Copyright of :
IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
ISSN 1978-1520 (print); ISSN 2460-7258 (online)
is a scientific journal the results of Computing
and Cybernetics Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133
email:ijccs.mipa@ugm.ac.id | http://jurnal.ugm.ac.id/ijccs



View My Stats1
View My Stats2