Analisis Sentimen Twitter untuk Teks Berbahasa Indonesia dengan Maximum Entropy dan Support Vector Machine
Noviah Dwi Putranti(1*), Edi Winarko(2)
(1) 
(2) Jurusan Ilmu Komputer dan Elektronika, FMIPA UGM, Yogyakarta
(*) Corresponding Author
Abstract
Abstrak
Analisis sentimen dalam penelitian ini merupakan proses klasifikasi dokumen tekstual ke dalam dua kelas, yaitu kelas sentimen positif dan negatif. Data opini diperoleh dari jejaring sosial Twitter berdasarkan query dalam Bahasa Indonesia. Penelitian ini bertujuan untuk menentukan sentimen publik terhadap objek tertentu yang disampaikan di Twitter dalam bahasa Indonesia, sehingga membantu usaha untuk melakukan riset pasar atas opini publik.
Data yang sudah terkumpul dilakukan proses preprocessing dan POS tagger untuk menghasilkan model klasifikasi melalui proses pelatihan. Teknik pengumpulan kata yang memiliki sentimen dilakukan dengan pendekatan berdasarkan kamus, yang dihasilkan dalam penelitian ini berjumlah 18.069 kata. Algoritma Maximum Entropy digunakan untuk POS tagger dan algoritma yang digunakan untuk membangun model klasifikasi atas data pelatihan dalam penelitian ini adalah Support Vector Machine. Fitur yang digunakan adalah unigram dengan fitur pembobotan TFIDF. Implementasi klasifikasi diperoleh akurasi 86,81 % pada pengujian 7 fold cross validation untuk tipe kernel Sigmoid. Pelabelan kelas secara manual dengan POS tagger menghasilkan akurasi 81,67%.
Kata kunci—analisis sentimen, klasifikasi, maximum entropy POS tagger, support vector machine, twitter.
Abstract
Sentiment analysis in this research classified textual documents into two classes, positive and negative sentiment. Opinion data obtained a query from social networking site Twitter of Indonesian tweet. This research uses Indonesian tweets. This study aims to determine public sentiment toward a particular object presented in Twitter businesses conduct market.
Collected data then prepocessed to help POS tagged to generate classification models through the training process. Sentiment word collection has done the dictionary based approach, which is generated in this study consists 18.069 words. Maximum Entropy algorithm is used for POS tagger and the algorithms used to build the classification model on the training data is Support Vector Machine. The unigram features used are the features of TFIDF weighting.Classification implementation 86,81 % accuration at examination of 7 validation cross fold for the type of kernel of Sigmoid. Class labeling manually with POS tagger yield accuration 81,67 %.
Keywords—sentiment analysis, classification, maximum entropy POS tagger, support vector machine, twitter.
Keywords
Full Text:
PDFReferences
[1] Aliandu, 2012, Analisis Sentimen Tweet Berbahasa Indonesia di Twitter, Tesis, Fakultas MIPA, Pasca Sarjana Ilmu Komputer, Universitas Gadjah Mada, Yogyakarta.
[2] Campagne, J.C., Dux, J., Guyot, P. dan Julien, D., 2012, Twitter reaches half a billion accounts more than 140 millions in the U.S., http://semiocast.com/ publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_
in_the_US, diakses tanggal 16 Oktober 2012.
[3] Liu, B., 2010, Sentiment Analysis and Subjectivity. Handbook of Natural Language Processing, Second Edition, (editors: N. Indurkhya and F. J. Damerau). Chapman and Hall/CRC, USA.
[4] Markdalen, A. dan Zapponi,C., 2012, Top 20 Countries Chart,http://aworldof tweets. frogdesign.com/ diakses 17 September 2012.
[5] McCallum, A., Freitag, D., dan Pereira, F., 2000, Maximum Entropy Markov Models for Information Extraction and Segmentation, Proc. ICML 2000, pp. 591–598, Stanford, California.
[6] Nugroho, A.S., Witarto, A.B. dan Handoko, D. 2003, Application of Support Vector Machine in Bioinformatics, Proceeding of Indonesian Scientific Meeting in Central Japan, Gifu-Japan, December 20, 2003.
[7] Pang, B., Lee, L., dan Vaithyanathan, S., 2002, Thumbs up? Sentiment Classification using Machine Learning, in Proceedings of the ACL-02 conference on Empirical methods
in natural language processing, Volume 10, pp. 79–86, Morristown, NJ, USA.
[8] Ratnaparkhi, A., 1996, A Maximum Entropy Model for Part-Of-Speech Tagging, In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Philadelphia, 17-18 Mei 1996.
[9] Weng, J., Lim, E. dan Jiang, J., 2010, TwitterRank: Finding Topic-sensitive Influential Twitterers, WSDM’10, New York City, New York, USA, February 4–6 2010.
DOI: https://doi.org/10.22146/ijccs.3499
Article Metrics
Abstract views : 46790 | views : 31028Refbacks
- There are currently no refbacks.
Copyright (c) 2014 IJCCS - Indonesian Journal of Computing and Cybernetics Systems
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
View My Stats1