JUCS - Journal of Universal Computer Science 15(4): 805-825, doi: 10.3217/jucs-015-04-0805
Fingerprinting Lexical Contexts over the Web
expand article infoVincenzo Di Lecce, Marco Calabrese, Domenico Soldo
‡ Polytechnic of Bari, Taranto, Italy
Open Access
Abstract
In this paper a novel technique for identifying lexical contexts in web resources is presented. The basic idea is to consider web site anchortexts as lexicalized descriptions of an individual ontology organized in the form of a graph of concept words. In the search for peculiar semantic patterns, the concept of web minutia (transposed from the forensic domain) is introduced. The proposed technique consists in searching for web minutiae in the analyzed web sites by means of a golden ontology. Web minutiae act as fingerprints for context-specific web resources; in this sense they are a powerful computational tool to identify and categorize the Web. The WordNet database has been used as golden ontology for our experiments on English web documents. WordNet allows for indexing and retrieving word senses and inter-word taxonomical relations like hyponymy and hypernymy. It has proven to be an efficient mediator between web ontologies and context-dependent taxonomies. Our experiments have been carried out on a preliminary data set of several tens of thousand links taken by web sites of thirteen UK universities. Preliminary results seem to confirm the ability of web minutiae to identify lexical contexts across the Web.
Keywords
minutia, golden ontology, Semantic Web, Web Mining, knowledge discovery, WordNet