JUCS - Journal of Universal Computer Science 22(5): 691-708, doi: 10.3217/jucs-022-05-0691
Sentiment Classification of Spanish Reviews: An Approach based on Feature Selection and Machine Learning Methods
expand article infoMario Andres Paredes-Valverde, Jorge Limon-Romero§, Diego Tlapa§, Yolanda Baez-Lopez§
‡ Universidad de Murcia, Murcia, Spain§ Universidad Autónoma de Baja California Mexico, Ensenada, Mexico
Open Access
Abstract
Sentiment analysis aims to extract users' opinions from review documents. Nowadays, there are two main approaches for sentiment analysis: the semantic orientation and the machine learning. Sentiment analysis approaches based on Machine Learning (ML) methods work over a set of features extracted from the users' opinions. However, the high dimensionality of the feature vector reduces the effectiveness of this approach. In this sense, we propose a sentiment classification method based on feature selection mechanisms and ML methods. The present method uses a hybrid feature extraction method based on POS pattern and dependency parsing. The features obtained are enriched semantically through common-sense knowledge bases. Then, a feature selection method is applied to eliminate the noisy and irrelevant features. Finally, a set of classifiers is trained in order to classify unknown data. To prove the effectiveness of our approach, we have conducted an evaluation in the movies and technological products domains. Also, our proposal was compared with well-known methods and algorithms used on the sentiment classification field. Our proposal obtained encouraging results based on the F-measure metric, ranging from 0.786 to 0.898 for the aforementioned domains.
Keywords
sentiment analysis, opinion mining, natural language processing, machine learning, feature selection methods