JUCS - Journal of Universal Computer Science 11(8): 1383-1396, doi: 10.3217/jucs-011-08-1383
Semantic Preprocessing of Web Request Streams for Web Usage Mining
expand article infoJason J. Jung
‡ Yeungnam University, Gyeongsan, Republic of Korea
Open Access
Abstract
Efficient data preparation needs to discover the underlying knowledge from complicated Web usage data. In this paper, we have focused on two main tasks, semantic outlier detection from online Web request streams and segmentation (or sessionization) of them. We thereby exploit semantic technologies to infer the relationships among Web requests. Web ontologies such as taxonomies and directories can label each Web request as all the corresponding hierarchical topic paths. Our algorithm consists of two steps. The first step is the nested repetition of top-down partitioning for establishing a set of candidates of session boundaries, and the next step is evaluation process of bottom-up merging for reconstructing segmented sequences. In addition, we propose the hybrid approach of this method, as combining with the existing heuristics. Using synthesized dataset and real­world dataset of the access log files of IRCache, we conducted experiments and showed that semantic preprocessing method improves the performance of rule discovery algorithms. It means that we can conceptually track the behavior of users tending to easily change their intentions and interests, or simultaneously try to search various kinds of information on the Web.
Keywords
Web usage mining, semantic analysis, browsing patterns