Pemodelan topik pada dokumen paten terkait pupuk di Indonesia berbasis Latent Dirichlet Allocation

  • Aris Yaman Statistika dan Sain Data, IPB University/LIPI http://orcid.org/0000-0002-0305-9054
  • Bagus Sartono Dept. Statistika dan Sains Data, IPB University
  • Agus M. Soleh Dept. Statistika dan Sains Data, IPB University
Keywords: LDA, Topic Modelling, Paten, Topic Coherence

Abstract

Introduction. Fertilizer is one of the most important production factors in the world of agriculture. It is crucial to increase the capacity of technology related to fertilizers. Analysis of patent documents can be one way to analyze technological developments, especially fertilizers.

Data Collection Methods. The data used in this research are metadata, especially the title and abstract of a patent document in Indonesia. With the keyword "fertilizer," Patent metadata was processed in the 1945-2017 period.

Data Analysis. The LDA model can provide a reasonable interpretation regarding topic modeling based on text data.

Results and Discussion. The results find that degree of the patent title is better than the abstract of the patent. The LDA approach can adequately separate the topics of fertilizer patent technology so that it does not have multiple interpretations.

Conclusion. Based on the findings, there are nine essential topics in the development of fertilizer technology. There is a phenomenon of the lack of technology collaboration between IPC technology sections.

Author Biography

Aris Yaman, Statistika dan Sain Data, IPB University/LIPI

Mahasiswa Pasca Sarjana, Departemen Statistika dan Sain Data IPB University

Peneliti di Pusat Penelitian Informatika LIPI

References

Adriani, M., Asian, J., Nazief, B., Williams, H. E., & Tahaghoghi, S. M. M. (2005). Stemming Indonesian : A Confix-Stripping Approach. Conferences in Research and Practice in Information Technology Series, 38(4), 307–314. https://doi.org/10.1145/1316457.1316459

Asian, J., Williams, H. E., & Tahaghoghi, S. M. M. (2005). Stemming Indonesian. Conferences in Research and Practice in Information Technology Series, 38(January), 307–314. https://doi.org/10.1145/1316457.1316459

Blei, D., Carin, L., & Dunson, D. (2012). Probabilistic topic models. Communications of the Acm, 27(6), 55–65. https://doi.org/10.1109/MSP.2010.938079

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. The Art and Science of Analyzing Software Data, 3, 139–159. https://doi.org/10.1016/B978-0-12-411519-4.00006-9

Campbell, J. C., Hindle, A., & Stroulia, E. (2015). Latent Dirichlet Allocation: Extracting topics from software engineering data. The Art and Science of Analyzing Software Data, 139–159. https://doi.org/10.1016/B978-0-12-411519-4.00006-9

Chuang, J., Manning, C. D., & Heer, J. (2012). Termite: Visualization techniques for assessing textual topic models. Proceedings of the Workshop on Advanced Visual Interfaces AVI, 74–77. https://doi.org/10.1145/2254556.2254572

FAO. (2016). Agricultural Cost of Production Statistics :Guidelines for Data Collection, Compilation and Dissemination (FAO (ed.)). Food and Agriculture Organization of the United Nations.

Hongshu, C., Guangquan, Z., Donghua, Z., & Jie, L. (2017). Topic-based technological forecasting based on patent data: A case study of Australian patents from 2000 to 2014. Technological Forecasting and Social Change, 119, 39–52. https://doi.org/10.1016/j.techfore.2017.03.009

Hu, J., Li, S., Hu, J., & Yang, G. (2018). A hierarchical feature extraction model for multi-label mechanical patent classification. Sustainability (Switzerland), 10(1), 219. https://doi.org/10.3390/su10010219

Kim, G., & Bae, J. (2017). A novel approach to forecast promising technology through patent analysis. Technological Forecasting and Social Change, 117, 228–237. https://doi.org/10.1016/j.techfore.2016.11.023

Liang, C., Weijiao, S., Guancan, Y., Jing, Z., & Xiaoping, L. (2016). A topic model integrating patent classification information for patent analysis. Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 41(October), 123–126.

Mabey, B. (2015). Visualizing topic models. In Dato (Ed.), Data Science Summit and Dato Conference 2015. Dato, Inc.

Momeni, A., & Rost, K. (2016). Identification and monitoring of possible disruptive technologies by patent-development paths and topic modeling. Technological Forecasting and Social Change, 104, 16–29. https://doi.org/10.1016/j.techfore.2015.12.003

Presiden Republik Indonesia. (2016). Undang-Undang No 13 Tahun 2016:Paten (Issue 1).

Röder, M., Both, A., & Hinneburg, A. (2015). Exploring the space of topic coherence measures. WSDM 2015 - Proceedings of the 8th ACM International Conference on Web Search and Data Mining, 399–408. https://doi.org/10.1145/2684822.2685324

Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. Workshop on Interactive Language Learning, Visualization, and Interfaces, 63–70. https://doi.org/10.3115/v1/w14-3110

Suhyeon, K., Haecheong, P., & Junghye, L. (2020). Word2vec-based latent semantic analysis (W2V-LSA) for topic modeling: A study on blockchain technology trend analysis. Expert Systems with Applications, 152. https://doi.org/10.1016/j.eswa.2020.113401

Vayansky, I., & Kumar, S. A. P. (2020). A review of topic modeling methods. Information Systems, 94. https://doi.org/10.1016/j.is.2020.101582

WIPO. (2018). Guide to the International Patent Classification. WIPO (World Intellectual Property Organization). https://www.wipo.int/export/sites/www/classifications/ipc/en/guide/guide_ipc.pdf.

Yu, X., & Zhang, B. (2019). Obtaining advantages from technology revolution: A patent roadmap for competition analysis and strategy planning. Technological Forecasting and Social Change, 145(April), 273–283. https://doi.org/10.1016/j.techfore.2017.10.008

Yun, J., & Geum, Y. (2020). Automated classification of patents: A topic modeling approach. Computers and Industrial Engineering, 147. https://doi.org/10.1016/j.cie.2020.106636

Published
2021-11-19
How to Cite
Yaman, A., Sartono, B., & M. Soleh, A. (2021). Pemodelan topik pada dokumen paten terkait pupuk di Indonesia berbasis Latent Dirichlet Allocation. Berkala Ilmu Perpustakaan Dan Informasi, 17(2), 168-180. https://doi.org/10.22146/bip.v17i2.2147
Section
Articles