JUCS - Journal of Universal Computer Science 24(6): 682-710, doi: 10.3217/jucs-024-06-0682
Cancer Classification by Gene Subset Selection from Microarray Dataset
expand article infoAsit Kumar Das, Soumen Kumar Pati§, Hsien-Hung Huang|, Chi-Ken Chen|
‡ Indian Institute of Engineering Science and Technology, Shibpur, India§ St. Thomas' College of Engineering and Technology, Kolkata, India| Jen-Ai Hospital, Taichung, Taiwan
Open Access
Abstract
Microarray dataset contains huge number of genes, many of which are irrelevant regarding cancer classification and as a result classification accuracy is reduced. Therefore, the dataset should be pre-processed to filter out these redundant genes. In this paper, initially a Pareto optimality based Multi-objective Genetic Algorithm has been proposed where non-linear cellular automata is employed to overcome the demerits of random initialization to generate initial population in high dimensional space. The fitness functions are defined based on both attribute dependency and boundary region exploration of rough set theory and Log-Likelihood ratio to select the informative genes. The chromosomes are hybridized by applying multi-point crossover; whereas proximity mutation builds on Flip-bit mutation with a little modification to produce fittest offspring. Finally, the gene subset with strong biological significance in cancer treatment is obtained from the Pareto dominant solutions. Performances are investigated on publicly available microarray cancer datasets and compared with the state-of-the-art methods to demonstrate the effectiveness of the proposed method.
Keywords
multi-objective genetic algorithm, gene selection, cellular automata, rough set theory, log-likelihood ratio, proximity mutation