Automatic Discovery and Aggregation of Compound Names for the Use in Knowledge Representations

Christian Biemann; Uwe Quasthoff; Karsten Böhm; Christian Wolff

doi:10.3217/jucs-009-06-0530

JUCS - Journal of Universal Computer Science 9(6): 530-541, doi: 10.3217/jucs-009-06-0530

Automatic Discovery and Aggregation of Compound Names for the Use in Knowledge Representations

Christian Biemann^‡, Uwe Quasthoff^§, Karsten Böhm^|, Christian Wolff^¶

‡ University of Leipzig, Germany§ Leipzig University, Leipzig, Germany| TextTech Ltd., Leipzig, Germany¶ Chemnitz University of Technology, Germany

Corresponding author: Christian Biemann ( biem@informatik.uni-leipzig.de )

This article is freely available under the J.UCS Open Content License.

Citation: Biemann C, Quasthoff U, Böhm K, Wolff C (2003) Automatic Discovery and Aggregation of Compound Names for the Use in Knowledge Representations. JUCS - Journal of Universal Computer Science 9(6): 530-541. https://doi.org/10.3217/jucs-009-06-0530

Abstract

Automatic acquisition of information structures like Topic Maps or semantic networks from large document collections is an important issue in knowledge management. An inherent problem with automatic approaches is the treatment of multiword terms as single semantic entities. Taking company names as an example, we present a method for learning multiword terms from large text corpora exploiting their internal structure. Through the iteration of a search step and a verification step the single words typically forming company names are learnt. These name elements are used for recognizing compounds in order to use them for further processing. We give some evaluation of experiments on company name extraction and discuss some applications.

Keywords

corpora, semantic relations, topic maps, text mining, knowledge management, named entity extraction