JUCS - Journal of Universal Computer Science 20(2): 213-239, doi: 10.3217/jucs-020-02-0213
Combining Psycho-linguistic, Content-based and Chat-based Features to Detect Predation in Chatrooms
expand article infoJavier Parapar, David E. Losada§, Álvaro Barreiro|
‡ University of A Coruña, Coruña, Spain§ Universidade de Santiago de Compostela, Santiago de Compostela, Spain| University of A Coruña, A Coruña, Spain
Open Access
Abstract
The Digital Age has brought great benefits for the human race but also some draw-backs. Nowadays, people from opposite corners of the World can communicate online via instant messaging services. Unfortunately, this has introduced new kinds of crime. Sexual predators haveadapted their predatory strategies to these platforms and, usually, the target victims are kids. The authorities cannot manually track all threats because massive amounts of online conversationstake place in a daily basis. Automatic methods for alerting about these crimes need to be designed. This is the main motivation of this paper, where we present a Machine Learning approachto identify suspicious subjects in chat-rooms. We propose novel types of features for representing the chatters and we evaluate different classifiers against the largest benchmark available.This empirical validation shows that our approach is promising for the identification of predatory behaviour. Furthermore, we carefully analyse the characteristics of the learnt classifiers. Thispreliminary analysis is a first step towards profiling the behaviour of the sexual predators when chatting on the Internet.
Keywords
sexual predation, cybercrime, text mining, machine learning, support vector machines, psycho-linguistic analysis