JUCS - Journal of Universal Computer Science 29(6): 569-594, doi: 10.3897/jucs.96652

Politically-oriented information inference from text

Samuel Caetano da Silva^‡, Ivandre Paraboni^‡

‡ University of Sao Paulo, Sao Paulo, Brazil

Corresponding author: Ivandre Paraboni ( ivandre@usp.br )

This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY-ND 4.0). This license allows reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use.

Citation: da Silva SC, Paraboni I (2023) Politically-oriented information inference from text. JUCS - Journal of Universal Computer Science 29(6): 569-594. https://doi.org/10.3897/jucs.96652

Abstract

The inference of politically-oriented information from text data is a popular research topic in Natural Language Processing (NLP) at both text- and author-level. In recent years, studies of this kind have been implemented with the aid of text representations ranging from simple count-based models (e.g., bag-of-words) to sequence-based models built from transformers (e.g., BERT). Despite considerable success, however, we may still ask whether results may be improved further by combining these models with additional text representations. To shed light on this issue, the present work describes a series of experiments to compare a number of strategies for political bias and ideology inference from text data using sequence-based BERT models, syntax-and semantics-driven features, and examines which of these representations (or their combinations) improve overall model accuracy. Results suggest that one particular strategy - namely, the combination of BERT language models with syntactic dependencies - significantly outperforms well-known count- and sequence-based text classifiers alike. In particular, the combined model has been found to improve accuracy across all tasks under consideration, outperforming the SemEval hyperpartisan news detection top-performing system by up to 6%, and outperforming the use of BERT alone by up to 21%, making a potentially strong case for the use of heterogeneous text representations in the present tasks.

Keywords

Natural language processing, Text classification, Politically-oriented inference, Sentiment analysis, Author profiling