JUCS - Journal of Universal Computer Science 24(6): 682-710, doi: 10.3217/jucs-024-06-0682

Cancer Classification by Gene Subset Selection from Microarray Dataset

Asit Kumar Das^‡, Soumen Kumar Pati^§, Hsien-Hung Huang^|, Chi-Ken Chen^|

‡ Indian Institute of Engineering Science and Technology, Shibpur, India§ St. Thomas' College of Engineering and Technology, Kolkata, India| Jen-Ai Hospital, Taichung, Taiwan

Corresponding author: Asit Das ( akdas@cs.iiests.ac.in )

This article is freely available under the J.UCS Open Content License.

Citation: Das AK, Pati SK, Huang H-H, Chen C-K (2018) Cancer Classification by Gene Subset Selection from Microarray Dataset. JUCS - Journal of Universal Computer Science 24(6): 682-710. https://doi.org/10.3217/jucs-024-06-0682

Abstract

Microarray dataset contains huge number of genes, many of which are irrelevant regarding cancer classification and as a result classification accuracy is reduced. Therefore, the dataset should be pre-processed to filter out these redundant genes. In this paper, initially a Pareto optimality based Multi-objective Genetic Algorithm has been proposed where non-linear cellular automata is employed to overcome the demerits of random initialization to generate initial population in high dimensional space. The fitness functions are defined based on both attribute dependency and boundary region exploration of rough set theory and Log-Likelihood ratio to select the informative genes. The chromosomes are hybridized by applying multi-point crossover; whereas proximity mutation builds on Flip-bit mutation with a little modification to produce fittest offspring. Finally, the gene subset with strong biological significance in cancer treatment is obtained from the Pareto dominant solutions. Performances are investigated on publicly available microarray cancer datasets and compared with the state-of-the-art methods to demonstrate the effectiveness of the proposed method.

Keywords

multi-objective genetic algorithm, gene selection, cellular automata, rough set theory, log-likelihood ratio, proximity mutation