JUCS - Journal of Universal Computer Science 8(12): 1047-1064, doi: 10.3217/jucs-008-12-1047
On the Semiautomatic Generation of WordNet Type Synsets and Clusters
expand article infoFlorentina Hristea
‡ University of Bucharest, Bucharest, Romania
Open Access
Abstract
WordNet (WN) is a lexical knowledge base, first developed for English and then adopted for several Western European languages, which was created as a machine-readable dictionary based on psycholinguistic principles. Our paper is an attempt to discuss the semiautomatic generation of WNs for languages other than English, a topic of great interest since the existence of such WNs will create the appropriate infrastructure for advanced Information Technology systems. Extending the algorithmic approach proposed in [Nikolov and Petrova, 01] we introduce a semiautomatic method based on heuristics for generating noun and adjective synsets and clusters. This choice of involved parts of speech is determined by the fact that nouns and adjectives have completely different organizations in WN: the hierarchy and the N-dimensional hyper-space respectively. Our approach to WN generation relies on so-called "class methods", namely it uses as knowledge sources individual entries coming from bilingual dictio� naries and WN synsets, but at the same time demonstrates the need to combine such methods with structural ones.
Keywords
WordNet, e-set, synset, synset id, cluster