Corresponding author: Ivandre Paraboni ( ivandre@usp.br ) © Vitor dos Santos, Ivandre Paraboni. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY-ND 4.0). This license allows reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use. Citation:
dos Santos V, Paraboni I (2022) Myers-Briggs personality classification from social media text using pre-trained language models. JUCS - Journal of Universal Computer Science 28(4): 378-395. https://doi.org/10.3897/jucs.70941 |
In Natural Language Processing, the use of pre-trained language models has been shown to obtain state-of-the-art results in many downstream tasks such as sentiment analysis, author identification and others. In this work, we address the use of these methods for personality classification from text. Focusing on the Myers-Briggs (MBTI) personality model, we describe a series of experiments in which the well-known Bidirectional Encoder Representations from Transformers (BERT) model is fine-tuned to perform MBTI classification. Our main findings suggest that the current approach significantly outperforms well-known text classification models based on bag-of-words and static word embeddings alike across multiple evaluation scenarios, and generally outperforms previous work in the field.