Data preparation strategies in Kubeflow for cloud-native AI systems

dc.citation.epage72
dc.citation.issue2
dc.citation.journalTitleВимірювальна техніка та метрологія
dc.citation.spage66
dc.contributor.affiliationLviv Polytechnic National University
dc.contributor.affiliationLviv Polytechnic National University
dc.contributor.authorBershchankyi, Yevhen
dc.contributor.authorKlym, Halyna
dc.coverage.placenameЛьвів
dc.coverage.placenameLviv
dc.date.accessioned2025-11-25T13:13:57Z
dc.date.created2025-06-20
dc.date.issued2025-06-20
dc.description.abstractThis article presents the main findings from an in-depth study of data preparation strategies using Kubeflow in cloud-native AI systems deployed on Azure Kubernetes Service. The results demonstrate that integrating Kubeflow Pipelines with Azure-native tools enables scalable and automated processing of large datasets, significantly improving training efficiency and model accuracy. The use of TensorFlow Data Validation proved effective in detecting schema anomalies and data drift, enhancing data reliability across iterative ML workflows. A case study confirms that the implemented pipeline reduced data processing time by 35 % and increased pipeline reproducibility through integrated metadata tracking and data versioning. These outcomes highlight Kubeflow’s practical value in supporting efficient, traceable, and production-ready AI pipelines in enterprise-grade cloud environments.
dc.format.extent66-72
dc.format.pages7
dc.identifier.citationBershchankyi Y. Data preparation strategies in Kubeflow for cloud-native AI systems / Yevhen Bershchankyi, Halyna Klym // Measuring Equipment and Metrology. — Lviv : Lviv Politechnic Publishing House, 2025. — Vol 86. — No 2. — P. 66–72.
dc.identifier.citation2015Bershchankyi Y., Klym H. Data preparation strategies in Kubeflow for cloud-native AI systems // Measuring Equipment and Metrology, Lviv. 2025. Vol 86. No 2. P. 66–72.
dc.identifier.citationenAPABershchankyi, Y., & Klym, H. (2025). Data preparation strategies in Kubeflow for cloud-native AI systems. Measuring Equipment and Metrology, 86(2), 66-72. Lviv Politechnic Publishing House..
dc.identifier.citationenCHICAGOBershchankyi Y., Klym H. (2025) Data preparation strategies in Kubeflow for cloud-native AI systems. Measuring Equipment and Metrology (Lviv), vol. 86, no 2, pp. 66-72.
dc.identifier.doihttps://doi.org/10.23939/istcmtm2025.02.066
dc.identifier.urihttps://ena.lpnu.ua/handle/ntb/121866
dc.language.isoen
dc.publisherВидавництво Львівської політехніки
dc.publisherLviv Politechnic Publishing House
dc.relation.ispartofВимірювальна техніка та метрологія, 2 (86), 2025
dc.relation.ispartofMeasuring Equipment and Metrology, 2 (86), 2025
dc.relation.referencesen[1] Bershchanskyi, Y. and Klym, H. (2023), October. Information System for Administration of Medical Institution. In 2023 13th International Conference on Dependable Systems, Services and Technologies (DESSERT) (pp. 1–4). IEEE. https://doi.org/10.1109/ DESSERT61349.2023.10416537
dc.relation.referencesen[2] Mehendale, P. (2023). Model Reliability and Performance through MLOps: Tools and Methodologies. J Artif Intell Mach Learn & Data Sci 2023, 1(4), pp. 980–984. https://doi.org/10.51219/JAIMLD/pushkar
dc.relation.referencesen[3] Abbas, T. and Eldred, A. (2025). AI-Powered Stream Processing: Bridging Real-Time Data Pipelines with Advanced Machine Learning Techniques. ResearchGate Journal of AI & Cloud Analytics. https://doi.org/10.13140/RG.2.2.26674.52167
dc.relation.referencesen[4] Yuan, D. Y. and Wildish, T. (2020, June). Bioinformatics application with kubeflow for batch processing in clouds. In International conference on high performance computing (pp. 355–367). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-59851-8_24
dc.relation.referencesen[5] Subramaniam, A. and Subramaniam, A. (2023, October). Automated Resource Scaling in Kubeflow through Time Series Forecasting. In 2023 IEEE 5th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA) (pp. 173–179). IEEE. https://doi.org/10.1109/ICCCMLA58983.2023.10346870
dc.relation.referencesen[6] Josyula, P., Ulaganathan, S. and Arava, S. K., (2025, February). A Survey of Federated Learning Orchestration Using Kubeflow: Challenges, Advances, and Future Directions. In 2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT) (pp. 566–572). IEEE. https://doi.org/10.1109/CE2CT64011.2025.10939611
dc.relation.referencesen[7] Bershchanskyi, Y., Klym, H. and Shevchuk, Y. (2024). Containerized artificial intelligent system design in cloud and cyber-physical systems., Advances in Cyber-Physical Systems (ACPS) 2024; vol. 9, No. 2 pp. 151–157. https://doi.org/10.23939/acps2024.02.151
dc.relation.referencesen[8] Yadavalli, T., Optimizing Machine Learning Workflows with Google Cloud Dataflow and TensorFlow Extended (TFX). J. Artif. Intell Mach. Learn & Data Sci. 2021, 1(1),pp. 2436–2441.https://doi.org/10.51219/JAIMLD/tulasiram-yadavalli/524
dc.relation.referencesen[9] Kienzler, R. and Kyas, H. (2020, January). Tensorflow 2.0 and Kubeflow for Scalable and Reproducable Enterprise AI. In CS & IT Conference Proceedings (Vol. 10, No. 1).CS & IT Conference Proceedings. [Online]. Available: https://csitcp.com/paper/10/101csit03.pdf
dc.relation.referencesen[10] Caveness, E., G. C., P. S., Peng, Z., Polyzotis, N., Roy, S. and Zinkevich, M. (2020, June). Tensorflow data validation: Data analysis and validation in continuous ml pipelines. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (pp. 2793–2796). https://doi.org/10.1145/3318464.3384707
dc.relation.referencesen[11] Devarasetty, N. (2024). Optimizing Data Engineering for AI: Improving Data Quality and Preparation for Machine Learning Application. The Computertech, pp. 1–28. https://doi.org/10.18535/raj.v7i03.397
dc.relation.referencesen[12] Teodoras, D. A., Stalidi, C., Popovici, E. C. and Suciu, G.(2024). Implementing a JavaMicroservice for Credit Fraud Detection Using Machine Learning. In 2024 23rd RoEduNet Conference: Networking in Education and Research (RoEduNet) (pp. 1–5). IEEE. https://doi.org/10.1109/RoEduNet64292.2024.10722691
dc.relation.referencesen[13] Bershchanskyi, Y. and Klym, H. (2024, October). Development Approaches of Cloud-Based System for Object Recognition on Images. In 2024 IEEE 17th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET) (pp. 205–208). IEEE. https://doi.org/10.1109/TCSET64720.2024.10755838
dc.relation.urihttps://doi.org/10.1109/
dc.relation.urihttps://doi.org/10.51219/JAIMLD/pushkar
dc.relation.urihttps://doi.org/10.13140/RG.2.2.26674.52167
dc.relation.urihttps://doi.org/10.1007/978-3-030-59851-8_24
dc.relation.urihttps://doi.org/10.1109/ICCCMLA58983.2023.10346870
dc.relation.urihttps://doi.org/10.1109/CE2CT64011.2025.10939611
dc.relation.urihttps://doi.org/10.23939/acps2024.02.151
dc.relation.urihttps://doi.org/10.51219/JAIMLD/tulasiram-yadavalli/524
dc.relation.urihttps://csitcp.com/paper/10/101csit03.pdf
dc.relation.urihttps://doi.org/10.1145/3318464.3384707
dc.relation.urihttps://doi.org/10.18535/raj.v7i03.397
dc.relation.urihttps://doi.org/10.1109/RoEduNet64292.2024.10722691
dc.relation.urihttps://doi.org/10.1109/TCSET64720.2024.10755838
dc.rights.holder© Національний університет „Львівська політехніка“, 2025
dc.subjectKubeflow
dc.subjectcloud-native AI
dc.subjectKubeflow data pre-processing
dc.subjectAI pipelines
dc.subjectML infrastructure
dc.subjectKubernetes orchestration
dc.titleData preparation strategies in Kubeflow for cloud-native AI systems
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2025v86n2_Bershchankyi_Y-Data_preparation_strategies_66-72.pdf
Size:
191.93 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.74 KB
Format:
Plain Text
Description: