Data shift monitoring in machine learning models

Dmitry Namiot, Eugene Ilyushin

Abstract


The fundamental moment of the operation of machine learning systems is that the models are trained on some selected training data set. Accordingly, the generalizations obtained at the training stage are due to the characteristics of some subset of the general population. If the characteristics of the data change during the operation of the system, then generalizations of the model become, generally speaking, untenable. At the same time, such a change in data should be considered the rule rather than the exception. This change in data characteristics is called data shift. This, in turn, means that any machine learning system that claims to be industrial must track the possible data shift. The presence of such a shift reduces the confidence in the results of the work or even makes the system unsuitable for further operation. Taking into account (overcoming) such a data shift is a separate task, simple retraining can be a big problem for critical applications, for example. But in any case, the first task is to determine the fact of data shift. The data shift itself is divided into several types, the most serious of which is a change in the relationship between dependent and independent variables. Naturally, the definition of data offset for streams is of particular interest, since this is directly related to critical applications.

Full Text:

PDF (Russian)

References


Dong, Guozhu, and Huan Liu, eds. Feature engineering for machine learning and data analytics. CRC Press, 2018.

Ilyushin, Eugene, Dmitry Namiot, and Ivan Chizhov. "Attacks on machine learning systems-common problems and methods." International Journal of Open Information Technologies 10.3 (2022): 17-22.

Namiot, Dmitry, and Eugene Ilyushin. "On the robustness and security of Artificial Intelligence systems." International Journal of Open Information Technologies 10.9 (2022): 126-134.

Kupriyanovsky, V., and D. Namit. "Digital economy-Smart way to work." International Journal of Open Information Technologies 2.4 (2016): 26-32.

Namiot, Dmitry, Eugene Ilyushin, and Ivan Chizhov. "The rationale for working on robust machine learning." International Journal of Open Information Technologies 9.11 (2021): 68-74.

Understanding Dataset Shift and Potential Remedies https://vectorinstitute.ai/wp-content/uploads/2021/08/ds_project_report_final_august9.pdf

Gama, João, et al. "A survey on concept drift adaptation." ACM computing surveys (CSUR) 46.4 (2014): 1-37.

Baena-Garcıa, Manuel, et al. "Early drift detection method." Fourth international workshop on knowledge discovery from data streams. Vol. 6. 2006.

Zheng, Shihao, et al. "Labelless concept drift detection and explanation." NeurIPS 2019 Workshop on Robust AI in Financial Services: Data, Fairness, Explainability, Trustworthiness, and Privacy. 2019.

Ma, Sisi, and Roshan Tourani. "Predictive and causal implications of using shapley value for model interpretation." Proceedings of the 2020 KDD Workshop on Causal Discovery. PMLR, 2020.

Frias-Blanco, Isvani, et al. "Online and non-parametric drift detection methods based on Hoeffding’s bounds." IEEE Transactions on Knowledge and Data Engineering 27.3 (2014): 810-823.

Žliobaitė, Indrė, et al. "Active learning with drifting streaming data." IEEE transactions on neural networks and learning systems 25.1 (2013): 27-39.

Souza, Vinicius MA, Farhan A. Chowdhury, and Abdullah Mueen. "Unsupervised drift detection on high-speed data streams." 2020 IEEE International Conference on Big Data (Big Data). IEEE, 2020.

Evidently AI https://www.evidentlyai.com/

Fiddler AI https://www.fiddler.ai/blog/how-to-detect-data-drift

Kenthapadi, Krishnaram, et al. "Model Monitoring in Practice: Lessons Learned and Open Challenges." Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022.

Namiot, Dmitry, Manfred Sneps-Sneppe, and Romass Pauliks. "On data stream processing in IoT applications." Internet of Things, Smart Spaces, and Next Generation Networks and Systems. Springer, Cham, 2018. 41-51.

Namiot, Dmitry, Eugene Ilyushin, and Oleg Pilipenko. "On Trusted AI Platforms." International Journal of Open Information Technologies 10.7 (2022): 119-127.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность MoNeTec 2024

ISSN: 2307-8162