Pemodelan topik pada dokumen paten terkait pupuk di Indonesia berbasis Latent Dirichlet Allocation

  • Aris Yaman Statistika dan Sain Data, IPB University/LIPI
  • Bagus Sartono Dept. Statistika dan Sains Data, IPB University
  • Agus M. Soleh Dept. Statistika dan Sains Data, IPB University
Keywords: LDA, Topic Modelling, Paten, Topic Coherence


Introduction. Fertilizer is one of the most important production factors in the world of agriculture. It is crucial to increase the capacity of technology related to fertilizers. Analysis of patent documents can be one way to analyze technological developments, especially fertilizers.

Data Collection Methods. The data used in this research are metadata, especially the title and abstract of a patent document in Indonesia. With the keyword "fertilizer," Patent metadata was processed in the 1945-2017 period.

Data Analysis. The LDA model can provide a reasonable interpretation regarding topic modeling based on text data.

Results and Discussion. The results find that degree of the patent title is better than the abstract of the patent. The LDA approach can adequately separate the topics of fertilizer patent technology so that it does not have multiple interpretations.

Conclusion. Based on the findings, there are nine essential topics in the development of fertilizer technology. There is a phenomenon of the lack of technology collaboration between IPC technology sections.

Author Biography

Aris Yaman, Statistika dan Sain Data, IPB University/LIPI

Mahasiswa Pasca Sarjana, Departemen Statistika dan Sain Data IPB University

Peneliti di Pusat Penelitian Informatika LIPI


Adriani, M., Asian, J., Nazief, B., Williams, H. E., & Tahaghoghi, S. M. M. (2005). Stemming Indonesian : A Confix-Stripping Approach. Conferences in Research and Practice in Information Technology Series, 38(4), 307–314.

Asian, J., Williams, H. E., & Tahaghoghi, S. M. M. (2005). Stemming Indonesian. Conferences in Research and Practice in Information Technology Series, 38(January), 307–314.

Blei, D., Carin, L., & Dunson, D. (2012). Probabilistic topic models. Communications of the Acm, 27(6), 55–65.

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. The Art and Science of Analyzing Software Data, 3, 139–159.

Campbell, J. C., Hindle, A., & Stroulia, E. (2015). Latent Dirichlet Allocation: Extracting topics from software engineering data. The Art and Science of Analyzing Software Data, 139–159.

Chuang, J., Manning, C. D., & Heer, J. (2012). Termite: Visualization techniques for assessing textual topic models. Proceedings of the Workshop on Advanced Visual Interfaces AVI, 74–77.

FAO. (2016). Agricultural Cost of Production Statistics :Guidelines for Data Collection, Compilation and Dissemination (FAO (ed.)). Food and Agriculture Organization of the United Nations.

Hongshu, C., Guangquan, Z., Donghua, Z., & Jie, L. (2017). Topic-based technological forecasting based on patent data: A case study of Australian patents from 2000 to 2014. Technological Forecasting and Social Change, 119, 39–52.

Hu, J., Li, S., Hu, J., & Yang, G. (2018). A hierarchical feature extraction model for multi-label mechanical patent classification. Sustainability (Switzerland), 10(1), 219.

Kim, G., & Bae, J. (2017). A novel approach to forecast promising technology through patent analysis. Technological Forecasting and Social Change, 117, 228–237.

Liang, C., Weijiao, S., Guancan, Y., Jing, Z., & Xiaoping, L. (2016). A topic model integrating patent classification information for patent analysis. Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 41(October), 123–126.

Mabey, B. (2015). Visualizing topic models. In Dato (Ed.), Data Science Summit and Dato Conference 2015. Dato, Inc.

Momeni, A., & Rost, K. (2016). Identification and monitoring of possible disruptive technologies by patent-development paths and topic modeling. Technological Forecasting and Social Change, 104, 16–29.

Presiden Republik Indonesia. (2016). Undang-Undang No 13 Tahun 2016:Paten (Issue 1).

Röder, M., Both, A., & Hinneburg, A. (2015). Exploring the space of topic coherence measures. WSDM 2015 - Proceedings of the 8th ACM International Conference on Web Search and Data Mining, 399–408.

Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. Workshop on Interactive Language Learning, Visualization, and Interfaces, 63–70.

Suhyeon, K., Haecheong, P., & Junghye, L. (2020). Word2vec-based latent semantic analysis (W2V-LSA) for topic modeling: A study on blockchain technology trend analysis. Expert Systems with Applications, 152.

Vayansky, I., & Kumar, S. A. P. (2020). A review of topic modeling methods. Information Systems, 94.

WIPO. (2018). Guide to the International Patent Classification. WIPO (World Intellectual Property Organization).

Yu, X., & Zhang, B. (2019). Obtaining advantages from technology revolution: A patent roadmap for competition analysis and strategy planning. Technological Forecasting and Social Change, 145(April), 273–283.

Yun, J., & Geum, Y. (2020). Automated classification of patents: A topic modeling approach. Computers and Industrial Engineering, 147.

How to Cite
Yaman, A., Sartono, B., & M. Soleh, A. (2021). Pemodelan topik pada dokumen paten terkait pupuk di Indonesia berbasis Latent Dirichlet Allocation. Berkala Ilmu Perpustakaan Dan Informasi, 17(2), 168-180.