Topic Modeling on Online News.Portal Using Latent Dirichlet Allocation (LDA)
Mohammad Rezza Fahlevvi(1*), Azhari SN(2)
(1) Master Program of Computer Science, FMIPA UGM, Yogyakarta
(2) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(*) Corresponding Author
Abstract
The amount of News displayed on online news portals. Often does not indicate the topic being discussed, but the News can be read and analyzed. You can find the main issues and trends in the News being discussed. It would be best if you had a quick and efficient way to find trending topics in the News. One of the methods that can be used to solve this problem is topic modeling.
Theme modeling is necessary to allow users to easily and quickly understand modern themes' development. One of the algorithms in topic modeling is the Latent Dirichlet Allocation (LDA). This research stage begins with data collection, preprocessing, n-gram formation, dictionary representation, weighting, topic model validation, topic model formation, and topic modeling results.
Based on the results of the topic evaluation, the. The best value of topic modeling using coherence was related to the number of passes. The number of topics produced 20 keys, five cases with a 0.53 coherence value. It can be said to be relatively stable based on the standard coherence value.Keywords
Full Text:
PDFReferences
[1] Nair G. 2016. Text mining 101: Topic modeling Aug. 2019 [Online]. Available: http://www.kdnuggets.com/2016/07/text-mining-101-topic- modeling.html. [Accessed: 16-Feb-2019]
[2] Korzycki, M., Gatkowska, I., Lubaszewski, W., 2017. 2 - Can the Human Association Norm Evaluate Machine-Made Association Lists?, in Sharp, B., Sèdes, F., Lubaszewski, W. (Eds.), Cognitive Approaches to Natural Language Processing. Elsevier, pp. 21–40. https://doi.org/10.1016/B978-1-78548-253-3.50002-0.
[3] C. Zou, “Analyzing research trends on drug safety using topic modeling,” Expert Opin. Drug Saf., vol. 17, no. 6, pp. 629–636, 2018.
[4] K. B. Putra and R. P. Kusumawardani, “Analisis Topik Informasi Publik Media Sosial di Surabaya Menggunakan Pemodelan Latent Dirichlet Allocation (LDA),” J. Tek. ITS, vol. 6, no. 2, pp. 4–9, 2017.
[5] I.Komputer, D. Ilmu, F. Matematik, P. Alam, and U. G. Mada, “Document Clustering Dengan Latent Dirichlet Allocation Dan Ward,” vol. V, no. September, 2018.
[6] I. N. Kabiru, P. K. Sari, S. Prodi, and M. Bisnis, “Analisa Konten Media Sosial E-Commerce Pada Instagram Menggunakan Metode Sentimen Analysis Dan Lda-Based Topic Modeling (Studi Kasus : Shopee Indonesia ) Analysis Of Content Social Media E-Commerce In Instagram Using Sentiment Analysis And Lda Based Topki,” vol. 6, no. 1, pp. 12–19, 2019.
[7] Krasnashchok, K., Jouili, S., 2018. Improving Topic Quality by Promoting Named Entities in Topic Modeling. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics 247–253.
[8] Utami, K.,P., 2017. Analisis Topik Data Media Sosial Twitter Menggunakan Model Topik Latent Dirichlet Allocation, Skripsi, Program Studi S1 Ilmu Komputer, Fakultas Matematika dan Ilmu Pengetahuan Alam, Institut Pertanian Bogor, Bogor.
[9] Bhatia, S., Lau, J.H., Baldwin, T., 2017. An Automatic Approach for Document-level Topic Model Evaluation. Conference on Computational Natural Language Learning 206–215.
[10] Chandrasekar, P., Qian, K., 2016. The Impact of Data Preprocessing on the Performance of a Naive Bayes Classifier, in 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC). pp. 618–619. https://doi.org/10.1109/COMPSAC.2016.205.
[11] Hong, V.N., Nguyen, H., Hieu, D.N., Snasel, V., 2016. n -Gram-Based Text Compression. Computational Intelligence and Neuroscience. https://doi.org/10.1155/2016/9483646
[12] Sun, S., Dai, Z., Xi, X., Shan, X., Wang, B., 2018. Ensemble Machine Learning Identification of Power Fault Countermeasure Text Considering Word String TF-IDF Feature, in 2018 IEEE International Conference of Safety Produce Informatization (IICSPI). pp. 610–616. https://doi.org/10.1109/IICPSPI.2018.8690443.
[13] Agustina, A. 2017. Analisis dan visualisasi suara pelanggan pada pusat layanan pelanggan dengan pemodelan topik menggunakan latent dirichlet allocation (LDA) studi kasus: PT. Petrokimia Gresik [skripsi]. Surabaya(ID): Institut Teknologi Sepuluh Nopember.
[14] Korencic, D., Ristov, S., Snajder, J., 2018. Document-based Topic Coherence Measures for News Media Text. Preprint submitted to Expert Systems with Applications 1–44.
DOI: https://doi.org/10.22146/ijccs.74383
Article Metrics
Abstract views : 3249 | views : 3408Refbacks
- There are currently no refbacks.
Copyright (c) 2022 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
View My Stats1