Дослідження методів інтелектуального аналізу даних для класифікації незбалансованих наборів даних

Дорошенко, А. В.; Савчук, Д. Ю.; Doroshenko, A. V.; Savchuk, D. Yu.

Дослідження методів інтелектуального аналізу даних для класифікації незбалансованих наборів даних

dc.citation.epage	57
dc.citation.issue	1
dc.citation.journalTitle	Український журнал інформаційних технологій
dc.citation.spage	48
dc.citation.volume	6
dc.contributor.affiliation	Національний університет “Львівська політехніка”
dc.contributor.affiliation	Lviv Polytechnic National University
dc.contributor.author	Дорошенко, А. В.
dc.contributor.author	Савчук, Д. Ю.
dc.contributor.author	Doroshenko, A. V.
dc.contributor.author	Savchuk, D. Yu.
dc.coverage.placename	Львів
dc.coverage.placename	Lviv
dc.date.accessioned	2025-05-21T08:02:13Z
dc.date.created	2024-02-28
dc.date.issued	2024-02-28
dc.description.abstract	Завдяки стрімкому розвитку інформаційних технологій, які широко використовуються у всіх сферах людського життя та діяльності, сьогодні накопичено надзвичайно великі обсяги даних. Відповідно застосування методів машинного навчання до цих даних дає змогу отримати нові практично корисні знання, які можуть бути використані для маркетингових, управлінських та дослідницьких цілей. Серед завдань інтелектуального аналізу даних – задачі регресії, прогнозування, кластеризації, класифікації та асоціативних правил. У цьому дослідженні розв’язано задачу бінарної класифікації. Основна мета роботи – дослідження різних методів машинного навчання для вирішення завдання класифікації та порівняння їхньої ефективності та точності. Окремим завданням є попереднє оброблення даних, спрямоване на вирішення проблеми незбалансованості вибірки, а також виявлення головних компонент, що використовуватимуться для вирішення завдання класифікації. Для цього досліджено та розроблено інформаційну систему класифікації банкрутства компанії із заданими економічними та фінансовими характеристиками. В дослідженні використано набір даних, на основі якого оцінено ефективність та якість застосування декількох відомих алгоритмів класифікації. Такими класифікаторами є: звичайний та лінійний Support Vector Machine, Extra Trees, Random Forest, Decision Tree, Logistic Regression, Multilayer perceptron Classifier, Gradient Boosting, Naive Bayes Classifier. Для передобробки даних здійснено масштабування, використано SMOTE-метод, щоб позбавитись незбалансованості навчальної вибірки, виконано виділення та аналіз головних компонент і L1 регуляризацію. Аналізування головних компонент дало змогу виявити 15 головних компонент, які найбільше впливають на точність класифікації і, відповідно, використовувати їх для класифікації. Аналізуючи отримані результати, ми встановили, що найкращим класифікатором був Random Forest з 95,9 %, а найгіршим Naive Bayes – 85,1 %. Для оцінювання якості класифікації та вибору найкращого класифікатора використано матрицю помилок (Confusion matrix), в якій враховується кількість істинно позитивних (TP) та істинно негативних значень (TN), а також розраховано кількість хибно негативних (FN) та хибно позитивних (FP) результатів класифікації. Наведено значення таких метрик, як точність, precision, чутливість, F1 та ROC. Точність – відсоток правильних відповідей алгоритму, чутливість (Recall) – це кількість TP, поділена на кількість TP плюс кількість FN. Показник F1 вказує на баланс між точністю та чутливістю. Precision – це кількість істинно позитивних прогнозів, поділена на кількість хибно позитивних та істинно негативних прогнозів. Оцінка ROC AUC – це інструмент вимірювання ефективності для задач класифікації за різних порогових значень, що показує, як модель може розрізняти класи. У висновках наведено найважливіші результати дослідження та вказано основний перспективний напрям розвитку роботи, а саме дослідження результатів класифікації для інших наборів даних та здійснення ефективніших оброблення та аналізу.
dc.description.abstract	With the rapid development of information technology, which is widely used in all spheres of human life and activity, extremely large amounts of data have been accumulated today. By applying machine learning methods to this data, new practically useful knowledge can be obtained. The main goal of this paper is to study different machine learning methods for solving the classification problem and compare their efficiency and accuracy. A separate task is data pre-processing aimed at solving the problem of sample imbalance, as well as identifying the principal components that will be used to solve the classification problem. For this purpose, an information system for classifying the bankruptcy of a company with specified economic and financial characteristics was researched and developed. The study uses a dataset on the basis of which the efficiency and quality of application of several existing classification algorithms are evaluated. These classifiers are: conventional and linear Support Vector Machine, Extra Trees, Random Forest, Decision Tree, Logistic Regression, Multilayer perceptron Classifier, Gradient Boosting, Naive Bayes Classifier. For data pre-processing, we scaled the data, used the SMOTE method to get rid of the imbalance of the training sample, and performed principal component analysis and L1 regularisation. Principal component analysis allowed us to identify 15 principal components that have the greatest impact on classification accuracy and, accordingly, use them in the classification process. Analysing the results, we found that the best classifier was Random Forest with 95.9 % accuracy, and the worst was Naive Bayes with 85.1 %. To evaluate the quality of classification and select the best classifier, the Confusion matrix is used, which takes into account the number of true positive (TP) and true negative (TN) values, as well as the number of false negative (FN) and false positive (FP) classification results, and the values of such metrics as accuracy, precision, sensitivity, F1, and ROC. Accuracy is the percentage of correct answers given by the algorithm, while Recall is the number of TPs divided by the number of TPs plus the number of FNs. F1 indicates the balance between accuracy and sensitivity. Precision is the number of true positive predictions divided by the number of false positive and true negative predictions. ROC AUC is a tool for measuring performance for classification tasks at different thresholds. It shows how well a model can distinguish between classes. The conclusions present the main results of the study and indicate the main future direction of the work, namely, the study of classification results for other datasets and more efficient processing and analysis.
dc.format.extent	48-57
dc.format.pages	10
dc.identifier.citation	Дорошенко А. В. Дослідження методів інтелектуального аналізу даних для класифікації незбалансованих наборів даних / А. В. Дорошенко, Д. Ю. Савчук // Український журнал інформаційних технологій. — Львів : Видавництво Львівської політехніки, 2024. — Том 6. — № 1. — С. 48–57.
dc.identifier.citationen	Doroshenko A. V. Research of data mining methods for classification of imbalanced data sets / A. V. Doroshenko, D. Yu. Savchuk // Ukrainian Journal of Information Tecnology. — Lviv : Lviv Politechnic Publishing House, 2024. — Vol 6. — No 1. — P. 48–57.
dc.identifier.doi	doi.org/10.23939/ujit2024.01.048
dc.identifier.uri	https://ena.lpnu.ua/handle/ntb/64856
dc.language.iso	uk
dc.publisher	Видавництво Львівської політехніки
dc.publisher	Lviv Politechnic Publishing House
dc.relation.ispartof	Український журнал інформаційних технологій, 1 (6), 2024
dc.relation.ispartof	Ukrainian Journal of Information Tecnology, 1 (6), 2024
dc.relation.references	1. Teslyuk, V., Doroshenko, A., & Savchuk, D. (2023). Intelligent Methods and Models for Assessing Level of Student Adaptation to Online Learning, 7th International Conference on Computational Linguistics and Intelligent Systems, April 20–21, 2023, Kharkiv, Ukraine. CEUR Workshop Proceedings, 3387, 331‑343.
dc.relation.references	2. Akhavan, F., & Hassannayebi, E. (2024). A hybrid machine learning with process analytics for predicting customer experience in online insurance services industry. Decision Analytics Journal, 11, art. no. 100452. https://doi.org/10.1016/j.dajour.2024.100452
dc.relation.references	3. Guha, A., & Veeranjaneyulu, N. (2019). Prediction of bankruptcy using big data analytic based on fuzzy C-means algorithm. IAES International Journal of Artificial Intelligence, 8(2), 168‑174. https://doi.org/10.11591/ijai.v8.i2.pp168-174
dc.relation.references	4. Liang, D., Lu, C.-C., Tsai, C.-F., & Shih, G.-A. (2016). Financial Ratios and Corporate Governance Indicators in Bankruptcy Prediction: A Comprehensive Study. European Journal of Operational Research, 252(2), 561–572. https://doi.org/10.1016/j.ejor.2016.01.012
dc.relation.references	5. Chen, T.-K., Liao, H.-H., Chen, G.-D., Kang, W.-H., & Lin, Y.-C. (2023). Bankruptcy Prediction Using Machine Learning Models with the Text-based Communicative Value of Annual Reports. Expert Systems with Applications, 120714. https://doi.org/10.1016/j.eswa.2023.120714
dc.relation.references	6. Ali, H., Mohd Salleh, M. N., Saedudin, R., Hussain, K., & Mushtaq, M. F. (2019). Imbalance class problems in data mining: a review. Indonesian Journal of Electrical Engineering and Computer Science, 14(3), 1552. https://doi.org/10.11591/ijeecs.v14.i3.pp1552-1563
dc.relation.references	7. More, S., & Rana, Anjali and P. (2018). Dipti and Agarwal, Isha, Random Forest Classifier Approach for Imbalanced Big Data Classification for Smart City Application Domains. International Journal of Computational Intelligence & IoT, 1(2). Retrieved from: https://ssrn.com/abstract=3354727
dc.relation.references	8. Santos, M. S., Abreu, P. H., Japkowicz, N. et al. (2022). On the joint-effect of class imbalance and overlap: a critical review. Artif Intell Rev, 55, 6207‑6275. https://doi.org/10.1007/s10462-022-10150-3
dc.relation.references	9. Doroshenko, А. & Tkachenko, R. (2018). Classification of Imbalanced Classes Using the Committee of Neural Networks. 2018 IEEE 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), 400–403, https://doi.org/10.1109/STC-CSIT.2018.8526611
dc.relation.references	10. Basha, S. J., Madala, S. R., Vivek, K., Kumar, E. S., & Ammannamma, T. (2022). A Review on Imbalanced Data Classification Techniques. 2022 International Conference on Advanced Computing Technologies and Applications (ICACTA), Coimbatore, India, 1–6, https://doi.org/10.1109/ICACTA54488.2022.9753392
dc.relation.references	11. Zhongqiang, Sun, Wenhao, Ying, Wenjin, Zhang, & Shengrong, Gong (2024). Undersampling method based on minority class density for imbalanced data. Expert Systems with Applications, 249(Part A), 123328. https://doi.org/10.1016/j.eswa.2024.123328
dc.relation.references	12. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321‑357. https://doi.org/10.1613/jair.953
dc.relation.references	13. Srividya, Mohanavalli, S., Sripriya, N., & Poornima, S. (2018). Outlier Detection using Clustering Techniques. International Journal of Engineering & Technology, 7(3.12), 813. https://doi.org/10.14419/ijet.v7i3.12.16508
dc.relation.references	14. Regularization path of L1- Logistic Regression. (б. д.). scikit-learn. https://scikit-learn.org/stable/auto_examples/linear_model/plot_logistic_path.html
dc.relation.references	15. Pan, H., Badawi, D., Bassi, I., Ozev, S. & Cetin, A. E. (2022). Detecting Anomaly in Chemical Sensors via L1-Kernel-Based Principal Component Analysis. IEEE Sensors Letters, 6(10), art no. 7004304, 1–4. https://doi.org/10.1109/LSENS.2022.3209102
dc.relation.references	16. Soomro, G. M., Krayem, S., Amur, Z. H., Chramcov, B., Jasek, R., & Noordin, I. (2023). Tumor Detection of Breast Tissue Using Random Forest with Principal Component Analysis. IEEE 8th International Conference on Engineering Technologies and Applied Sciences (ICETAS), Bahrain, Bahrain, 1–7, https://doi.org/10.1109/ICETAS59148.2023.10346582
dc.relation.references	17. Maćkiewicz, A., & Ratajczak, W. (1993). Principal components analysis (PCA). Computers & Geosciences, 19(3), 303‑342. https://doi.org/10.1016/0098-3004(93)90090-r
dc.relation.references	18. Doroshenko, Anastasіya (2019). Application of global optimization methods to increase the accuracy of classification in the data mining tasks. In: Luengo D., Subbotin S. (Eds.): Computer Modeling and Intelligent Systems. Proc. 2-nd Int. Conf. CMIS-2019, Vol-2353: Main Conference Zaporizhzhia, Ukraine, April 15-19, 98–109. https://doi.org/10.32782/cmis/2353-8
dc.relation.references	19. Jadhav, T. et al. (2023). Predicting Urban Land Cover Using Classification: A Machine Learning Approach. IEEE 11th Region 10 Humanitarian Technology Conference (R10-HTC), Rajkot, India, 450–454, https://doi.org/10.1109/R10-HTC57504.2023.10461930
dc.relation.references	20. Savchuk, D. & Doroshenko, A. (2021). Investigation of machine learning classification methods effectiveness. IEEE 16th International Conference on Computer Sciences and Information Technologies (CSIT), Lviv, Ukraine, 33–37. https://doi.org/10.1109/CSIT52700.2021.9648582
dc.relation.references	21. Ahmed, T., Paul, R. R., Alam, M. A., Hasan, M. T., & Rab, M. R. (2022). Performance Comparison of Different Machine Learning Classifiers in Categorizing Bangla News Articles. 4th International Conference on Natural Language Processing (ICNLP), Xi'an, China, 376–379, https://doi.org/10.1109/ICNLP55136.2022.00069
dc.relation.references	22. Tanouz, D., Subramanian, R. Raja, Eswar, D., Parameswara Reddy, G. V., Ranjith Kumar, A., Praneeth, CH. V. N. M. (2021). Credit Card Fraud Detection Using Machine Learning. 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 967–972. https://doi.org/10.1109/ICICCS51141.2021.9432308
dc.relation.references	23. Izonin, I., Tkachenko, R., Pidkostelnyi, R., Pavliuk, O., Khavalko, V., Batyuk, A. (2021). Experimental evaluation of the effectiveness of ann-based numerical data augmentation methods for diagnostics tasks CEUR Workshop Proceedings, 3038, 223‑232.
dc.relation.references	24. Md. Shojeb Hossain Shojol, Md Abu Ismail Siddique, Fariha Haque (2023) Enhanced Convolutional Neural Networks for Early Detection and Classification of Ophthalmic Diseases. International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD), Dhaka, Bangladesh, 2023, 209–213. https://doi.org/10.1109/ICICT4SD59951.2023.10303558
dc.relation.references	25. Singh, A. K. (2022). Detection of Credit Card Fraud using Machine Learning Algorithms. 11th International Conference on System Modeling & Advancement in Research Trends (SMART), Moradabad, India, 2022, 673–677. https://doi.org/10.1109/SMART55829.2022.10047099
dc.relation.references	26. Subbotin, S., Tabunshchyk, G., Arras, P., Tabunshchyk, D., & Trotsenko, E. (2021). Intelligent Data Analysis for Individual Hypertensia Patient's State Monitoring and Prediction. IEEE International Conference on Smart Information Systems and Technologies (SIST), Nur-Sultan, Kazakhstan, 2021, 1–4. https://doi.org/10.1109/SIST50301.2021.9465989
dc.relation.referencesen	1. Teslyuk, V., Doroshenko, A., & Savchuk, D. (2023). Intelligent Methods and Models for Assessing Level of Student Adaptation to Online Learning, 7th International Conference on Computational Linguistics and Intelligent Systems, April 20–21, 2023, Kharkiv, Ukraine. CEUR Workshop Proceedings, 3387, 331‑343.
dc.relation.referencesen	2. Akhavan, F., & Hassannayebi, E. (2024). A hybrid machine learning with process analytics for predicting customer experience in online insurance services industry. Decision Analytics Journal, 11, art. no. 100452. https://doi.org/10.1016/j.dajour.2024.100452
dc.relation.referencesen	3. Guha, A., & Veeranjaneyulu, N. (2019). Prediction of bankruptcy using big data analytic based on fuzzy C-means algorithm. IAES International Journal of Artificial Intelligence, 8(2), 168‑174. https://doi.org/10.11591/ijai.v8.i2.pp168-174
dc.relation.referencesen	4. Liang, D., Lu, C.-C., Tsai, C.-F., & Shih, G.-A. (2016). Financial Ratios and Corporate Governance Indicators in Bankruptcy Prediction: A Comprehensive Study. European Journal of Operational Research, 252(2), 561–572. https://doi.org/10.1016/j.ejor.2016.01.012
dc.relation.referencesen	5. Chen, T.-K., Liao, H.-H., Chen, G.-D., Kang, W.-H., & Lin, Y.-C. (2023). Bankruptcy Prediction Using Machine Learning Models with the Text-based Communicative Value of Annual Reports. Expert Systems with Applications, 120714. https://doi.org/10.1016/j.eswa.2023.120714
dc.relation.referencesen	6. Ali, H., Mohd Salleh, M. N., Saedudin, R., Hussain, K., & Mushtaq, M. F. (2019). Imbalance class problems in data mining: a review. Indonesian Journal of Electrical Engineering and Computer Science, 14(3), 1552. https://doi.org/10.11591/ijeecs.v14.i3.pp1552-1563
dc.relation.referencesen	7. More, S., & Rana, Anjali and P. (2018). Dipti and Agarwal, Isha, Random Forest Classifier Approach for Imbalanced Big Data Classification for Smart City Application Domains. International Journal of Computational Intelligence & IoT, 1(2). Retrieved from: https://ssrn.com/abstract=3354727
dc.relation.referencesen	8. Santos, M. S., Abreu, P. H., Japkowicz, N. et al. (2022). On the joint-effect of class imbalance and overlap: a critical review. Artif Intell Rev, 55, 6207‑6275. https://doi.org/10.1007/s10462-022-10150-3
dc.relation.referencesen	9. Doroshenko, A. & Tkachenko, R. (2018). Classification of Imbalanced Classes Using the Committee of Neural Networks. 2018 IEEE 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), 400–403, https://doi.org/10.1109/STC-CSIT.2018.8526611
dc.relation.referencesen	10. Basha, S. J., Madala, S. R., Vivek, K., Kumar, E. S., & Ammannamma, T. (2022). A Review on Imbalanced Data Classification Techniques. 2022 International Conference on Advanced Computing Technologies and Applications (ICACTA), Coimbatore, India, 1–6, https://doi.org/10.1109/ICACTA54488.2022.9753392
dc.relation.referencesen	11. Zhongqiang, Sun, Wenhao, Ying, Wenjin, Zhang, & Shengrong, Gong (2024). Undersampling method based on minority class density for imbalanced data. Expert Systems with Applications, 249(Part A), 123328. https://doi.org/10.1016/j.eswa.2024.123328
dc.relation.referencesen	12. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321‑357. https://doi.org/10.1613/jair.953
dc.relation.referencesen	13. Srividya, Mohanavalli, S., Sripriya, N., & Poornima, S. (2018). Outlier Detection using Clustering Techniques. International Journal of Engineering & Technology, 7(3.12), 813. https://doi.org/10.14419/ijet.v7i3.12.16508
dc.relation.referencesen	14. Regularization path of L1- Logistic Regression. (b. d.). scikit-learn. https://scikit-learn.org/stable/auto_examples/linear_model/plot_logistic_path.html
dc.relation.referencesen	15. Pan, H., Badawi, D., Bassi, I., Ozev, S. & Cetin, A. E. (2022). Detecting Anomaly in Chemical Sensors via L1-Kernel-Based Principal Component Analysis. IEEE Sensors Letters, 6(10), art no. 7004304, 1–4. https://doi.org/10.1109/LSENS.2022.3209102
dc.relation.referencesen	16. Soomro, G. M., Krayem, S., Amur, Z. H., Chramcov, B., Jasek, R., & Noordin, I. (2023). Tumor Detection of Breast Tissue Using Random Forest with Principal Component Analysis. IEEE 8th International Conference on Engineering Technologies and Applied Sciences (ICETAS), Bahrain, Bahrain, 1–7, https://doi.org/10.1109/ICETAS59148.2023.10346582
dc.relation.referencesen	17. Maćkiewicz, A., & Ratajczak, W. (1993). Principal components analysis (PCA). Computers & Geosciences, 19(3), 303‑342. https://doi.org/10.1016/0098-3004(93)90090-r
dc.relation.referencesen	18. Doroshenko, Anastasiya (2019). Application of global optimization methods to increase the accuracy of classification in the data mining tasks. In: Luengo D., Subbotin S. (Eds.): Computer Modeling and Intelligent Systems. Proc. 2-nd Int. Conf. CMIS-2019, Vol-2353: Main Conference Zaporizhzhia, Ukraine, April 15-19, 98–109. https://doi.org/10.32782/cmis/2353-8
dc.relation.referencesen	19. Jadhav, T. et al. (2023). Predicting Urban Land Cover Using Classification: A Machine Learning Approach. IEEE 11th Region 10 Humanitarian Technology Conference (R10-HTC), Rajkot, India, 450–454, https://doi.org/10.1109/R10-HTC57504.2023.10461930
dc.relation.referencesen	20. Savchuk, D. & Doroshenko, A. (2021). Investigation of machine learning classification methods effectiveness. IEEE 16th International Conference on Computer Sciences and Information Technologies (CSIT), Lviv, Ukraine, 33–37. https://doi.org/10.1109/CSIT52700.2021.9648582
dc.relation.referencesen	21. Ahmed, T., Paul, R. R., Alam, M. A., Hasan, M. T., & Rab, M. R. (2022). Performance Comparison of Different Machine Learning Classifiers in Categorizing Bangla News Articles. 4th International Conference on Natural Language Processing (ICNLP), Xi'an, China, 376–379, https://doi.org/10.1109/ICNLP55136.2022.00069
dc.relation.referencesen	22. Tanouz, D., Subramanian, R. Raja, Eswar, D., Parameswara Reddy, G. V., Ranjith Kumar, A., Praneeth, CH. V. N. M. (2021). Credit Card Fraud Detection Using Machine Learning. 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 967–972. https://doi.org/10.1109/ICICCS51141.2021.9432308
dc.relation.referencesen	23. Izonin, I., Tkachenko, R., Pidkostelnyi, R., Pavliuk, O., Khavalko, V., Batyuk, A. (2021). Experimental evaluation of the effectiveness of ann-based numerical data augmentation methods for diagnostics tasks CEUR Workshop Proceedings, 3038, 223‑232.
dc.relation.referencesen	24. Md. Shojeb Hossain Shojol, Md Abu Ismail Siddique, Fariha Haque (2023) Enhanced Convolutional Neural Networks for Early Detection and Classification of Ophthalmic Diseases. International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD), Dhaka, Bangladesh, 2023, 209–213. https://doi.org/10.1109/ICICT4SD59951.2023.10303558
dc.relation.referencesen	25. Singh, A. K. (2022). Detection of Credit Card Fraud using Machine Learning Algorithms. 11th International Conference on System Modeling & Advancement in Research Trends (SMART), Moradabad, India, 2022, 673–677. https://doi.org/10.1109/SMART55829.2022.10047099
dc.relation.referencesen	26. Subbotin, S., Tabunshchyk, G., Arras, P., Tabunshchyk, D., & Trotsenko, E. (2021). Intelligent Data Analysis for Individual Hypertensia Patient's State Monitoring and Prediction. IEEE International Conference on Smart Information Systems and Technologies (SIST), Nur-Sultan, Kazakhstan, 2021, 1–4. https://doi.org/10.1109/SIST50301.2021.9465989
dc.relation.uri	https://doi.org/10.1016/j.dajour.2024.100452
dc.relation.uri	https://doi.org/10.11591/ijai.v8.i2.pp168-174
dc.relation.uri	https://doi.org/10.1016/j.ejor.2016.01.012
dc.relation.uri	https://doi.org/10.1016/j.eswa.2023.120714
dc.relation.uri	https://doi.org/10.11591/ijeecs.v14.i3.pp1552-1563
dc.relation.uri	https://ssrn.com/abstract=3354727
dc.relation.uri	https://doi.org/10.1007/s10462-022-10150-3
dc.relation.uri	https://doi.org/10.1109/STC-CSIT.2018.8526611
dc.relation.uri	https://doi.org/10.1109/ICACTA54488.2022.9753392
dc.relation.uri	https://doi.org/10.1016/j.eswa.2024.123328
dc.relation.uri	https://doi.org/10.1613/jair.953
dc.relation.uri	https://doi.org/10.14419/ijet.v7i3.12.16508
dc.relation.uri	https://scikit-learn.org/stable/auto_examples/linear_model/plot_logistic_path.html
dc.relation.uri	https://doi.org/10.1109/LSENS.2022.3209102
dc.relation.uri	https://doi.org/10.1109/ICETAS59148.2023.10346582
dc.relation.uri	https://doi.org/10.1016/0098-3004(93)90090-r
dc.relation.uri	https://doi.org/10.32782/cmis/2353-8
dc.relation.uri	https://doi.org/10.1109/R10-HTC57504.2023.10461930
dc.relation.uri	https://doi.org/10.1109/CSIT52700.2021.9648582
dc.relation.uri	https://doi.org/10.1109/ICNLP55136.2022.00069
dc.relation.uri	https://doi.org/10.1109/ICICCS51141.2021.9432308
dc.relation.uri	https://doi.org/10.1109/ICICT4SD59951.2023.10303558
dc.relation.uri	https://doi.org/10.1109/SMART55829.2022.10047099
dc.relation.uri	https://doi.org/10.1109/SIST50301.2021.9465989
dc.rights.holder	© Національний університет “Львівська політехніка”, 2024
dc.subject	інтелектуальний аналіз даних
dc.subject	класифікація
dc.subject	форматування
dc.subject	масштабування
dc.subject	аналіз
dc.subject	набір даних
dc.subject	вибірка даних
dc.subject	ознака
dc.subject	значення
dc.subject	порівняння
dc.subject	класифікатори
dc.subject	data mining
dc.subject	classification
dc.subject	formatting
dc.subject	scaling
dc.subject	analysis
dc.subject	dataset
dc.subject	data sample
dc.subject	feature
dc.subject	value
dc.subject	comparing
dc.subject	classifiers
dc.subject.udc	004.8
dc.title	Дослідження методів інтелектуального аналізу даних для класифікації незбалансованих наборів даних
dc.title.alternative	Research of data mining methods for classification of imbalanced data sets
dc.type	Article

Files

Original bundle

Now showing 1 - 2 of 2

Name:: 2024v6n1_Doroshenko_A_V-Research_of_data_mining_48-57.pdf
Size:: 1.78 MB
Format:: Adobe Portable Document Format

Download

Name:: 2024v6n1_Doroshenko_A_V-Research_of_data_mining_48-57__COVER.png
Size:: 1.76 MB
Format:: Portable Network Graphics

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.81 KB
Format:: Plain Text
Description:

Download

Collections

Ukrainian Journal of Information Technology. – 2024. – Vol. 6, No. 1