JUCS - Journal of Universal Computer Science 30(6): 814-846, doi: 10.3897/jucs.105363
IMD-MP: Imputation of Missing Data in IoT Based on Matrix Profile and Spatio-temporal Correlations
expand article infoG.V.Vidya Lakshmi, S. Gopikrishnan
‡ VIT-AP University, Amaravathi, India
Open Access
Abstract
Data in the Internet of Things (IoT) domain may be missing due to connectivity errors, environmental extremes, sensor malfunctions, and human errors. Despite the many approaches for imputing missing values, the most significant difficulty in terms of imputation precision or compute complexity for larger missing sub-sequences in uni-variate series is still being explored. This work introduced IMD-MP (Imputation of Missing Data using Matrix Profile), a new technique that improves imputation accuracy for big data analysis in IoT applications based on spatial-temporal correlations using a novel distance metric Matrix Profile Distance (MPD). Our method preserves spatial correlation by grouping the sensors present in the network (using grouping algorithm-GA) to impute the missing data of the failed sensor node. After grouping, similar sensor nodes to the failed sensor node are identified using the Node Similarity Algorithm (NSF). From its similar sensor data, a certain number of sub-sequences that are most similar to the one preceding the failed node’s missing values are gathered. These sub-sequences heights are optimized to ensure temporal correlation in the imputed data. To find the optimal imputation sequence, the current research uses MPD and similarity scores. Numerical findings using sensor data from real-time environmental mon-itoring and Intel data sets demonstrate the algorithm’s effectiveness compared to other benchmarks.
Keywords
Internet of Things, Data imputation, Univariate data, Spatial correlation, Temporal correlation, Data quality