Development of Bot Detection Applications on Twitter Social Media Using Machine Learning with a Random Forest Classifier Algorithm

https://doi.org/10.22146/ijitee.56154

Aqilah Aini Zahra(1), Widyawan Widyawan(2), Silmi Fauziati(3*)

(1) Universitas Gadjah Mada
(2) Universitas Gadjah Mada
(3) Universitas Gadjah Mada
(*) Corresponding Author

Abstract


A Twitter bot is a Twitter account programmed to automatically do social activities by sending tweets through a scheduling program. Some bots intend to disseminate useful information such as earthquake and weather information. However, not a few bots have a negative influence, such as broadcasting false news, spam, or become a follower to increase an account's popularity. It can change public sentiments about an issue, decrease user confidence, or even change the social order. Therefore, an application is needed to distinguish between a bot and non-bot accounts. Based on these problems, this paper develops bot detection systems using machine learning for multiclass classification. These classes include human classes, informative, spammers, and fake followers. The model training used guided methods based on labeled training data. First, a dataset of 2,333 accounts was pre-processed to obtain 28 feature sets for classification. This feature set came from analysis of user profiles, temporal analysis, and analysis of tweets with numeric values. Afterward, the data was partitioned, normalized with scaling, and a random forest classifier algorithm was implemented on the data. After that, the features were reselected into 17 feature sets to obtain the highest accuracy achieved by the model. In the evaluation stage, bot detection models generated an accuracy of 96.79%, 97% precision, 96% recall, and an f-1 score of 96%. Therefore, the detection model was classified as having high accuracy. The bot detection model that had been completed was then implemented on the website and deployed to the cloud. In the end, this machine learning-based web application could be accessed and used by the public to detect Twitter bots.

Keywords


Bot Detection; Multiclass Classification; Machine Learning; Supervised Learning; Twitter

Full Text:

PDF


References

(2014) “Jack Dorsey Biography,” [Online], https://www.biography.com/business-figure/jack-dorsey, access date: 3-Sep-2019.

(2018) “Twitter by the Numbers (2018): Stats, Demographics & Fun Facts,” [Online], https://www.omnicoreagency.com/twitter-statistics/, access date: 13-Jun-2019.

V.S. Subrahmanian, A. Azaria, S. Durst, V. Kagan, A. Galstyan, et al., “The DARPA Twitter Bot Challenge,” Computer (Long. Beach. Calif.), Vol. 49, No. 6, pp. 38–46, 2016.

(2019) “Nain Weather Bot (@NainWxPxBot) | Twitter.” [Online], https://twitter.com/nainwxpxbot, access date: 3-Sep-2019.

K. Zetter (2009) “Trick or Tweet? Malware Abundant in Twitter URLs,” [Online], https://www.wired.com/2009/10/twitter-malware/, access date: 3-Sep-2019.

M. Haidermota, “Classifying Twitter User as a Bot or Not and Comparing Different Classification Algorithms.,” Int. J. Adv. Res. Comput. Sci., Vol. 9, No. 3, pp. 29–33, 2018.

M. Hossin and Sulaiman, “A Review on Evaluation Metrics for Data Classification Evaluations,” Int. J. Data Min. Knowl. Manag. Process, Vol. 5, No. 2, pp. 1-11, 2015.

J. Hurwitz and D. Kirsch, Machine Learning For Dummies, Hoboken, USA: John Wiley & Sons, Inc., 2018.

L. Breiman, “Random Forests,” Mach. Learn., Vol. 45, pp. 5–32, 2001.

H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter, a social network or a news media?,” Proceedings of the 19th International Conference on World Wide Web, WWW ’10, 2010, pp. 591–600.

M. Newberg (2017) “Nearly 48 million Twitter accounts could be bots, says study,” [Online], https://www.cnbc.com/2017/03/10/nearly-48-million-twitter-accounts-could-be-bots-says-study.html, access date: 27-Aug-2019.

(2017) “#BotSpot: Twelve Ways to Spot a Bot,” Medium, [Online], https://medium.com/dfrlab/botspot-twelve-ways-to-spot-a-bot-aedc7d9c110c, access date: 27-Aug-2019.

S. Schreder (2018) “10 Twitter bots that actually make the internet a better place - Internet Citizen.” [Online], https://blog.mozilla.org/internetcitizen/2018/01/19/10-twitter-bots-actually-make-internet-better-place/, access date: 19-Dec-2019.

A. Khalil, H. Hajjdiab, and N. Al-Qirim, “Detecting Fake Followers in Twitter: A Machine Learning Approach,” Int. J. Mach. Learn. Comput., Vol. 7, No. 6, pp. 198–202, 2018.

F.A. Aslam and H.N.M.J.M.M.M.M.A. Gulamgaus, “Efficient Way Of Web Development Using Python And Flask,” Int. J. Adv. Res. Comput. Sci., Vol. 6, No. 2, pp. 54–57, 2015.



DOI: https://doi.org/10.22146/ijitee.56154

Article Metrics

Abstract views : 2525 | views : 1709

Refbacks

  • There are currently no refbacks.




Copyright (c) 2020 IJITEE (International Journal of Information Technology and Electrical Engineering)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

ISSN  : 2550-0554 (online)

Contact :

Department of Electrical engineering and Information Technology, Faculty of Engineering
Universitas Gadjah Mada

Jl. Grafika No 2 Kampus UGM Yogyakarta

+62 (274) 552305

Email : ijitee.ft@ugm.ac.id

----------------------------------------------------------------------------