JUCS - Journal of Universal Computer Science 19(4): 563-580, doi: 10.3217/jucs-019-04-0563
A Semi-Supervised Ensemble Learning Method for Finding Discriminative Motifs and its Application
expand article infoThi Nhan Le, Tu Bao Ho, Saori Kawasaki, Tatsuo Kanda§, Katsuhiko Takabayashi§, Shuang Wu§, Osamu Yokosuka§
‡ Japan Advanced Institute of Science and Technology, Nomi, Japan§ Chiba University, Chiba, Japan
Open Access
Abstract
Finding discriminative motifs has recently received much attention in biomedicine as such motifs allow us to characterize in distinguishing two different classes of sequences. It is common in biomedical applications that the quantity of labeled sequences is very limited while a large number of unlabeled sequences is usually available. The current methods of discriminative motif finding are powerful and effective with large labeled datasets, but they do not function well on small labeled datasets. In this paper, we present a semi-supervised ensemble method for finding discriminative motifs which is based on the SLUPC algorithm, a separate-and-conquer searching method to discover motifs of type `discriminative one occurrence per sequence'. The proposed method, named E-SLUPC (Ensemble SLUPC), uses SLUPC to search discriminative motifs from an extended labeled dataset that contains labeled data and unlabeled data with predicted labels. Strong discriminative and frequent motifs characterizing two outcome classes of hepatitis C virus treatment (sustained viral response and non-sustained viral response) were detected and analyzed. Furthermore, the experimental evaluation shows that our method can function considerably well in the common context of medical research when the labeled data is usually difficult to obtain.
Keywords
discriminative motif, separate-and-conquer search, self-training technique, ensemble learning, hepatitis C virus, NS5A region