AbstractThis paper explores Natural Language Processing (NLP) in automatic comprehension and discourse analysis, focusing on argument mining. While previous works have focused on English, this study addresses the lack of adequate corpora and methodologies for Brazilian Portuguese. The researchers employed a corpus of essays from Brazil's National High School Exam (ENEM) to investigate the impact of discourse markers on identifying argumentative structures using feature engineering with machine learning. The proposed methodology offers key advantages over transformer-based approaches: enhanced interpretability of feature selection, computational efficiency, and improved adaptability across different linguistic domains. By systematically 'opening the black box' of machine learning models, this approach provides insights into the discourse marker identification decision-making process, in contrast to the opaque neural network models. Unlike the transformers-based solutions, this approach offers a transparent solution based on feature engineering allowing insights into the linguistic patterns underlying argumentative structures in Portuguese. While acknowledging the relatively small corpus size as a limitation, the researchers suggest that future work should focus on expanding the dataset for further evaluation. This work lays the groundwork for advancing NLP in Portuguese by providing valuable features and methodologies for feature engineering in automated linguistic analysis tasks such as essay scoring, opinion mining, and text summarization. The findings demonstrated a significant breakthrough, revealing that a concise set of only five argument mining-derived features dramatically improved the model accuracy, surpassing the performance of an initial, extensive set of over 600 features. These features specifically enhanced the evaluation of Competence 5, which assesses students' ability to develop intervention proposals grounded in scientific concepts.