Corresponding author: Jędrzej Kozal ( jedrzej.kozal@pwr.edu.pl ) Corresponding author: Filip Guzy ( filip.guzy@pwr.edu.pl ) Corresponding author: Michał Woźniak ( michal.wozniak@pwr.edu.pl ) © Jędrzej Kozal, Filip Guzy, Michał Woźniak. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY-ND 4.0). This license allows reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use. Citation:
Kozal J, Guzy F, Woźniak M (2022) Employing chunk size adaptation to overcome concept drift. JUCS - Journal of Universal Computer Science 28(3): 249-268. https://doi.org/10.3897/jucs.80735 |
Modern analytical systems must process streaming data and correctly respond to data distribution changes. The phenomenon of changes in data distributions is called concept drift, and it may harm the quality of the used models. Additionally, the possibility of concept drift appearance causes that the used algorithms must be ready for the continuous adaptation of the model to the changing data distributions. This work focuses on non-stationary data stream classification, where a classifier ensemble is used. To keep the ensemble model up to date, the new base classifiers are trained on the incoming data blocks and added to the ensemble while, at the same time, outdated models are removed from the ensemble. One of the problems with this type of model is the fast reaction to changes in data distributions. We propose the new Chunk Adaptive Restoration framework that can be adapted to any block-based data stream classification algorithm. The proposed algorithm adjusts the data chunk size in the case of concept drift detection to minimize the impact of the change on the predictive performance of the used model. The experimental research, backed up with the statistical tests, has proven that Chunk Adaptive Restoration significantly reduces the model’s restoration time.