Information Retrieval for Early Detection of Disease Using Semantic Similarity
Aszani Aszani(1), Hayyu Ilham Wicaksono(2*), Uffi Nadzima(3), Lukman Heryawan(4)
(1) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(2) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(3) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(4) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(*) Corresponding Author
Abstract
The growth of medical records continues to increase and needs to be used to improve doctors' performance in diagnosing a disease. A retrieval method returns proposed information to provide diagnostic recommendations based on symptoms from medical record datasets by applying the TF-IDF and cosine similarity methods. The challenge in this study was that the symptoms in the medical record dataset were dirty data obtained from patients who were not familiar with biological terms. Therefore, the symptoms were matched in the medical record data with the symptom terms used in the system and from the results, data augmentation was carried out to increase the amount of data up to about 3 times more. In the TF-IDF the highest accuracy with is only , while after augmentation of the test data, the accuracy becomes . The highest accuracy results with the same value using the cosine similarity method is and with the augmented test data accuracy increasing to . From this study it was concluded that a system with sufficient and relevant input of symptoms would provide a more accurate disease prediction. Prediction results using the TF-IDF method with are more accurate than predictions using the cosine similarity method.
Keywords
Full Text:
PDFReferences
[1] M. Mustakim and R. Wardoyo, “Survey Model-Model Pencarian Informasi Rekam,” JISKA J. Inform. Sunan Kalijaga, vol. 3, no. 3, pp. 132–144, 2019, [Online]. Available: https://doi.org/10.14421/jiska.2019.33-01
[2] R. Silalahi and E. J. Sinaga, “Perencanaan Implementasi Rekam Medis Elektronik Dalam Pengelolaan Unit Rekam Medis Klinik Pratama Romana,” J. Manaj. Inf. Kesehat. Indones., vol. 7, no. 1, p. 22, 2019, doi: 10.33560/jmiki.v7i1.219.
[3] C. of Australia, “MBS Telehealth Services from 1 July 2022,” 2022. http://www.mbsonline.gov.au/internet/mbsonline/publishing.nsf/Content/Factsheet-telehealth-1July22 (accessed Oct. 25, 2022).
[4] V. K and J. Singaraju, “Decision Support System for Congenital Heart Disease Diagnosis based on Signs and Symptoms using Neural Networks,” Int. J. Comput. Appl., vol. 19, no. 6, pp. 6–12, 2011, doi: 10.5120/2368-3115.
[5] A. M. Nuraini Ahmad, Arienda Addis Prasetyo, “Penerapan Information Retrieval Pada Search Engine,” J. Inov. Has. Penelit. dan Pengemb., vol. 1, no. 31, pp. 15–23, 2021, [Online]. Available: https://jurnalp4i.com/index.php/knowledge/article/view/771
[6] M. Yusuf and A. Cherid, “Implementasi Algoritma Cosine Similarity Dan Metode TF-IDF Berbasis PHP Untuk Menghasilkan Rekomendasi Seminar,” J. Ilm. Fak. Ilmu Komput., vol. 9, no. 1, pp. 8–16, 2020, [Online]. Available: https://publikasi.mercubuana.ac.id/index.php/fasilkom/article/view/8830
[7] Rahul Maheshwari, “Disease Detection based on Symptoms with treatment recommendation.” https://rahul-maheshmaheshwari.medium.com/disease-detection-based-on-symptoms-with-treatment-recommendation-with-scrapped-data-set-54e6be60a3d1 (accessed Oct. 25, 2022).
[8] Christopher D. Manning, Prabhakar Raghavan and H. Schütze, “Introduction to Modern Information Retrieval (2nd edition),” Libr. Rev., vol. 53, no. 9, pp. 462–463, 2004, doi: 10.1108/00242530410565256.
[9] A. R. Lahitani, A. E. Permanasari, and N. A. Setiawan, “Cosine similarity to determine similarity measure: Study case in online essay assessment,” Proc. 2016 4th Int. Conf. Cyber IT Serv. Manag. CITSM 2016, 2016, doi: 10.1109/CITSM.2016.7577578.
[10] K. Park, J. S. Hong, and W. Kim, “A Methodology Combining Cosine Similarity with Classifier for Text Classification,” Appl. Artif. Intell., vol. 34, no. 5, pp. 396–411, 2020, doi: 10.1080/08839514.2020.1723868.
[11] scikit-learn developer, “Metrics and scoring: quantifying the quality of predictions,” 2022. https://scikit-learn.org/stable/modules/model_evaluation.html (accessed Nov. 29, 2022).
[12] T. Phreeraphattanakarn and B. Kijsirikul, “Text data-augmentation using Text Similarity with Manhattan Siamese long short-term memory for Thai language,” J. Phys. Conf. Ser., vol. 1780, no. 1, 2021, doi: 10.1088/1742-6596/1780/1/012018.
DOI: https://doi.org/10.22146/ijccs.80077
Article Metrics
Abstract views : 1732 | views : 1468Refbacks
Copyright (c) 2023 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
View My Stats1