JUCS - Journal of Universal Computer Science 26(6): 734-746, doi: 10.3897/jucs.2020.039

The Modified Principal Component Analysis Feature Extraction Method for the Task of Diagnosing Chronic Lymphocytic Leukemia Type B-CLL

Mariusz Topolski^‡

‡ Wrocław University of Science and Technology, Wrocław, Poland

Corresponding author: Mariusz Topolski ( mariusz.topolski@pwr.edu.pl )

This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY-ND 4.0). This license allows reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use.

Citation: Topolski M (2020) The Modified Principal Component Analysis Feature Extraction Method for the Task of Diagnosing Chronic Lymphocytic Leukemia Type B-CLL. JUCS - Journal of Universal Computer Science 26(6): 734-746. https://doi.org/10.3897/jucs.2020.039

Abstract

The vast majority of medical problems are characterised by the relatively high spatial dimensionality of the task, which becomes problematic for many classic pattern recognition algorithms due to the well-known phenomenon of the curse of dimensionality. This creates the need to develop methods of space reduction, divided into strategies for the selection and extraction of features. The most commonly used tool of the second group is the PCA, which, unlike selection methods, does not select a subset of the original set of features and performs its mathematical transformation into a less dimensional form. However, natural downside of this algorithm is the fact that class context is not present in supervised learning tasks. This work proposes a feature extraction algorithm using the approach of the pca method, trying not only to reduce the feature space, but also trying to separate the class distributions in the available learning set. The problematic issue of the work was the creation of a method of feature extraction describing the prognosis for a chronic lymphocytic leukemia type B-CLL, which will be at least as good, or even better than when compared to other quality extractions. The purpose of the research was accomplished for binary and three-class cases in the event in which for verification of extraction quality, five algorithms of machine learning were applied. The obtained results were compared with the application of paired samples Wilcoxon test.

Keywords

Principal Components Analysis, data classification, recognition of patterns, lymphocytic leukemia type B-CLL