JUCS - Journal of Universal Computer Science 13(10): 1471-1483, doi: 10.3217/jucs-013-10-1471
Machine Learning-Based Keywords Extraction for Scientific Literature
expand article infoChunguo Wu, Maurizio Marchese§, Jingqing Jiang|, Alexander Ivanyukovich§, Yanchun Liang|
‡ Jilin University and Beijing Jiaotong University, China§ University of Trento, Italy| Jilin University, China
Open Access
Abstract
With the currently growing interest in the Semantic Web, keywords/metadata extraction is coming to play an increasingly important role. Keywords extraction from documents is a complex task in natural languages processing. Ideally this task concerns sophisticated semantic analysis. However, the complexity of the problem makes current semantic analysis techniques insufficient. Machine learning methods can support the initial phases of keywords extraction and can thus improve the input to further semantic analysis phases. In this paper we propose a machine learning-based keywords extraction for given documents domain, namely scientific literature. More specifically, the least square support vector machine is used as a machine learning method. The proposed method takes the advantages of machine learning techniques and moves the complexity of the task to the process of learning from appropriate samples obtained within a domain. Preliminary experiments show that the proposed method is capable to extract keywords from the domain of scientific literature with promising results.
Keywords
keywords extraction, metadata extraction, support vector machine, machine learning