JUCS - Journal of Universal Computer Science 28(4): 345-377, doi: 10.3897/jucs.69377
TwitterBulletin: An Intelligent and Real-Time Automated News Categorization Tool for Twitter
expand article infoSedef Demirci, Seref Sagiroglu
‡ Gazi University, Ankara, Turkey
Open Access
Abstract

Social media platforms have become popular news sources thanks to their immense popularity and high speed of information dissemination. Using these platforms is essential for news organizations and journalists to track and discover news in digital journalism age. However, the abundance of meaningless data and the lack of organization on these platforms make it difficult to reach valuable news for journalists. In this paper, we create the first public dataset containing large number of real-world Turkish news tweets belonging to different news categories, to the best of our knowledge. We propose an artificial intelligence-based two-step approach to assist journalists for accessing the news shared by various sources on social media under the relevant categories like politics (elections, riots, etc.), health (pandemic, covid-19, etc.), etc. via a single platform by reducing the possibility of overlooking needed information. In the first step, we propose a machine learning based novel model for collecting and categorizing news posts on social media. We implement several traditional machine learning and deep learning based algorithms and evaluate their classification performance in terms of accuracy, precision, recall, and F1 score. In the second step, we develop a software tool, named TwitterBulletin, which automatically retrieves Turkish news tweets and groups them under news categories in real time by using the CNN classifier which achieves the best performance in the first step. The results show that the overall accuracy rate of TwitterBulletin is reasonably high and satisfactory despite the challenge of classifying short tweets.

Keywords
News classification, artificial intelligence, deep learning, social media, Twitter, news topic modelling, news dataset