Ідентифікація звуку голосів птахів за допомогою згорткових нейронних мереж з використанням STFT та MEL спектрограм

Гонсьор, Оксана; Гонсьор, Юрій; Honsor, Oksana; Gonsor, Yuriy

doi:doi.org/10.23939/sisn2023.14.297

Ідентифікація звуку голосів птахів за допомогою згорткових нейронних мереж з використанням STFT та MEL спектрограм

dc.citation.epage	311
dc.citation.issue	14
dc.citation.journalTitle	Вісник Національного університету “Львівська політехніка”. Серія: Інформаційні системи та мережі
dc.citation.spage	297
dc.contributor.affiliation	Національний університет “Львівська політехніка”
dc.contributor.affiliation	Lviv Polytechnic National University
dc.contributor.author	Гонсьор, Оксана
dc.contributor.author	Гонсьор, Юрій
dc.contributor.author	Honsor, Oksana
dc.contributor.author	Gonsor, Yuriy
dc.coverage.placename	Львів
dc.coverage.placename	Lviv
dc.date.accessioned	2025-09-12T07:21:59Z
dc.date.created	2023-02-28
dc.date.issued	2023-02-28
dc.description.abstract	Загрози для клімату та глобальні зміни в екологічних процесах залишаються актуальною проблемою у всьому світі. Тому важливий постійний моніторинг цих змін, зокрема із використанням нестандартних підходів. Це завдання можна виконати на основі дослідження інформації про міграцію птахів. Одним із ефективних методів дослідження міграції птахів є слуховий метод, який потребує вдосконалення. Ось чому побудова моделі на основі методів машинного навчання, яка допоможе точно ідентифікувати наявність голосів птахів у аудіофайлі з метою дослідження міграцій птахів з певної території, є актуальною проблемою. У цій роботі роглянуто способи побудови моделі машинного навчання на основі аналізу спектрограм, яка допоможе точно ідентифікувати наявність голосів птахів в аудіофайлі з метою дослідження міграції птахів по визначеній території. Дослідження передбачає збирання та аналіз аудіофайлів, які можна використати для виявлення характеристик, відповідно до яких звук файлів буде ідентифікуватись як голоси птахів або відсутність звуку у файлі. Продемонстровано використання моделі CNN для класифікації наявності голосів птахів у аудіофайлі. Аналіз ефективності та точності моделі CNN в класифікації звуків у аудіофайлах показав, що краще використовувати Mel-спектрограми, ніж STFT-спектрограми, для дослідження та класифікації наявності звуків птахів у середовищі. Точність класифікації моделі, тренованої на основі Mel-спектрограм, становила 72 %, що на 8 % вище, ніж точність моделі, натренованої на STFT-спектрограмах.
dc.description.abstract	Threats to the climate and global changes in ecological processes remain an urgent problem throughout the world. Therefore, it is important to constantly monitor these changes, in particular, using non-standard approaches. This task can be implemented on the basis of research on bird migration information. One of the effective methods of studying bird migration is the auditory method, which needs improvement. That is why building a model based on machine learning methods that will help to accurately identify the presence of bird voices in an audio file for the purpose of studying bird migrations from a given area is an urgent problem. This paper examines ways of building a machine learning model based on the analysis of spectrograms, which will help to accurately identify the presence of bird voices in an audio file for the purpose of studying the migration of birds in a certain area. The research involves the collection and analysis of audio files that can be used to identify characteristics that will identify the sound of the files as birdsong or the absence of sound in the file. The use of the CNN model for the classification of the presence of bird voices in an audio file is demonstrated. Special attention is paid to the effectiveness and accuracy of the CNN model in the classification of sounds in audio files, which allows you to compare and choose the best classifier for a given type of file and model. Analysis of the effectiveness and accuracy of the CNN model in the classification of sounds in audio files showed that the use of Mel-spectrograms is better than the use of STFT-spectrograms for studying the classification of the presence of bird sounds in the environment. The classification accuracy of the model trained on the basis of Mel spectrograms was 72 %, which is 8 % better than the accuracy of the model trained on STFT spectrograms.
dc.format.extent	297-311
dc.format.pages	15
dc.identifier.citation	Гонсьор О. Ідентифікація звуку голосів птахів за допомогою згорткових нейронних мереж з використанням STFT та MEL спектрограм / Оксана Гонсьор, Юрій Гонсьор // Вісник Національного університету “Львівська політехніка”. Серія: Інформаційні системи та мережі. — Львів : Видавництво Львівської політехніки, 2023. — № 14. — С. 297–311.
dc.identifier.citationen	Honsor O. Identification of birds’ voices using convolutional neural networks based on STFT and MEL spectrogram / Oksana Honsor, Yuriy Gonsor // Information Systems and Networks. — Lviv : Lviv Politechnic Publishing House, 2023. — No 14. — P. 297–311.
dc.identifier.doi	doi.org/10.23939/sisn2023.14.297
dc.identifier.uri	https://ena.lpnu.ua/handle/ntb/111711
dc.language.iso	uk
dc.publisher	Видавництво Львівської політехніки
dc.publisher	Lviv Politechnic Publishing House
dc.relation.ispartof	Вісник Національного університету “Львівська політехніка”. Серія: Інформаційні системи та мережі, 14, 2023
dc.relation.ispartof	Information Systems and Networks, 14, 2023
dc.relation.references	1. Ghosh A., Sufian A., Sultana F., Chakrabarti A. & Debashis De. (2020). Fundamental Concepts of Convolutional Neural Network. Recent Trends and Advances in Artificial Intelligence and Internet of Things, 519-567. DOI:10.1007/978-3-030-32644-9_36.
dc.relation.references	2. Krizhevsky A., Sutskever I., & Hinton G. E. (2012). Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, 1097–1105.
dc.relation.references	3. Sultana F., Sufian A., & Dutta P. (2019). A review of object detection models based on convolutional neural network. CoRR, abs/1905.01614. DOI:10.1007/978-981-15-4288-6_1.
dc.relation.references	4. Sultana F., Sufian A., & Dutta P. (2018). Advancements in image classification using convolutional neural network. In 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), 122–129.
dc.relation.references	5. Everingham M., Van Gool L., Williams C. K. I., Winn J. & Zisserman A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338. DOI:10.1007/s11263-009-0275-4.
dc.relation.references	6. Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D., Erhan D., Vanhoucke V., & Rabinovich A. (2015). Going deeper with convolutions. In The IEEE Conference on ComputerVision and Pattern Recognition (CVPR). DOI: 10.48550/arXiv.1409.4842.
dc.relation.references	7. Shelhamer E., Long J., & Darrell T. (2015). Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 39(4), 640–651. DOI: 10.1109/CVPR.2015.7298965.
dc.relation.references	8. Dennis J. W. (2014). Sound event recognition in unstructured environments using spectrogram image processing. Doctoral thesis, Nanyang Technological University, Singapore. DOI: 10.32657/10356/59272
dc.relation.references	9. Mesaros A., Heittola T., Eronen A., & Virtanen T. (2010). Acoustic event detection in real life recordings. Proceedings of the European Signal Processing Conference (EUSIPCO), 1267–1271.
dc.relation.references	10. Tsau E., Chachada S., & Kuo C.-C. J. (2012). Content/Context-Adaptive Feature Selection for Environmental Sound Recognition. Proceedings of the Asia Pacific Signal & Information Processing Association (APSIPA).
dc.relation.references	11. Zhang Z. and Schuller B. Semi-supervised learning helps in sound event classification. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, 333–336. March, 2012.
dc.relation.references	12. Maccagno A., Mastropietro A., Mazziotta U., Scarpiniti M., Lee Y.-Ch. & Uncini A. (2021). A CNN Approach for Audio Classification in Construction Sites. Progresses in Artificial Intelligence and Neural Systems, 371–381. DOI: 10.1007/978-981-15-5093-5_33.
dc.relation.references	13. Ekpezu A., Wiafe I., Katsriku F. & Yaokumah W. (2021). Using deep learning for acoustic event classification: The case of natural disasters. The Journal of the Acoustical Society of America, 149(4): 292. DOI: 10.1121/10.0004771.
dc.relation.references	14. Khamparia A., Gupta D., Nguyen N. G., Khanna A., Pandey B., & Tiwari P. (2019). Sound classification using convolutional neural network and tensor deep stacking network, IEEE Access, 7(1), 7717–7727. DOI: 10.1109/ACCESS.2018.2888882.
dc.relation.references	15. Zhang, T., Lee, Y.-C., Scarpiniti, M., Uncini, A. (2018). A supervised machine learning-based sound identification for construction activity monitoring and performance evaluation. Proceedings of 2018 Construction Research Congress (CRC 2018), New Orleans, Louisiana, USA, 358–366.
dc.relation.references	16. Kons Z., Toledo-Ronen O. (2013). Audio Event Classification Using Deep Neural Networks. Proc. Interspeech 2013, 1482–1486. DOI: 10.21437/Interspeech.2013-384.
dc.relation.references	17. Lee H., Grosse R., Ranganath R., & Ng A.Y. (2011). Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks. Communications of the ACM, Vol. 54, No. 10, 95–103. DOI: 10.1145/2001269.2001295.
dc.relation.references	18. Gartzman D. Getting to Know the Mel Spectrogram. Towards Data Science. August, 2019. Retrieved from: https://towardsdatascience.com/getting-to-know-the-mel-spectrogram-31bca3e2d9d0 (date of access: 20.09.2023)
dc.relation.references	19. Papia Nandi – CNNs for audio classification. A primer in deep learning for audio classification using TensorFlow. Towards Data Science. Murch, 2021. Retrieved from: https://towardsdatascience.com/cnns-for-audio-classification-6244954665ab (date of access: 16.09.2023)
dc.relation.references	20. Chollet, F. Deep Learning with Python (2018), v. 361, New York: Manning.
dc.relation.references	21. SHANTAMVIJAYPUTRA - Bird Voice Detection Dataset. Retrieved from: https://www.kaggle.com/datasets/vshantam/bird-voice-detection (date of access: 15.05.2023)
dc.relation.referencesen	1. Ghosh A., Sufian A., Sultana F., Chakrabarti A. & Debashis De. (2020). Fundamental Concepts of Convolutional Neural Network. Recent Trends and Advances in Artificial Intelligence and Internet of Things, 519-567. DOI:10.1007/978-3-030-32644-9_36.
dc.relation.referencesen	2. Krizhevsky A., Sutskever I., & Hinton G. E. (2012). Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, 1097–1105.
dc.relation.referencesen	3. Sultana F., Sufian A., & Dutta P. (2019). A review of object detection models based on convolutional neural network. CoRR, abs/1905.01614. DOI:10.1007/978-981-15-4288-6_1.
dc.relation.referencesen	4. Sultana F., Sufian A., & Dutta P. (2018). Advancements in image classification using convolutional neural network. In 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), 122–129.
dc.relation.referencesen	5. Everingham M., Van Gool L., Williams C. K. I., Winn J. & Zisserman A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338. DOI:10.1007/s11263-009-0275-4.
dc.relation.referencesen	6. Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D., Erhan D., Vanhoucke V., & Rabinovich A. (2015). Going deeper with convolutions. In The IEEE Conference on ComputerVision and Pattern Recognition (CVPR). DOI: 10.48550/arXiv.1409.4842.
dc.relation.referencesen	7. Shelhamer E., Long J., & Darrell T. (2015). Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 39(4), 640–651. DOI: 10.1109/CVPR.2015.7298965.
dc.relation.referencesen	8. Dennis J. W. (2014). Sound event recognition in unstructured environments using spectrogram image processing. Doctoral thesis, Nanyang Technological University, Singapore. DOI: 10.32657/10356/59272
dc.relation.referencesen	9. Mesaros A., Heittola T., Eronen A., & Virtanen T. (2010). Acoustic event detection in real life recordings. Proceedings of the European Signal Processing Conference (EUSIPCO), 1267–1271.
dc.relation.referencesen	10. Tsau E., Chachada S., & Kuo C.-C. J. (2012). Content/Context-Adaptive Feature Selection for Environmental Sound Recognition. Proceedings of the Asia Pacific Signal & Information Processing Association (APSIPA).
dc.relation.referencesen	11. Zhang Z. and Schuller B. Semi-supervised learning helps in sound event classification. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, 333–336. March, 2012.
dc.relation.referencesen	12. Maccagno A., Mastropietro A., Mazziotta U., Scarpiniti M., Lee Y.-Ch. & Uncini A. (2021). A CNN Approach for Audio Classification in Construction Sites. Progresses in Artificial Intelligence and Neural Systems, 371–381. DOI: 10.1007/978-981-15-5093-5_33.
dc.relation.referencesen	13. Ekpezu A., Wiafe I., Katsriku F. & Yaokumah W. (2021). Using deep learning for acoustic event classification: The case of natural disasters. The Journal of the Acoustical Society of America, 149(4): 292. DOI: 10.1121/10.0004771.
dc.relation.referencesen	14. Khamparia A., Gupta D., Nguyen N. G., Khanna A., Pandey B., & Tiwari P. (2019). Sound classification using convolutional neural network and tensor deep stacking network, IEEE Access, 7(1), 7717–7727. DOI: 10.1109/ACCESS.2018.2888882.
dc.relation.referencesen	15. Zhang, T., Lee, Y.-C., Scarpiniti, M., Uncini, A. (2018). A supervised machine learning-based sound identification for construction activity monitoring and performance evaluation. Proceedings of 2018 Construction Research Congress (CRC 2018), New Orleans, Louisiana, USA, 358–366.
dc.relation.referencesen	16. Kons Z., Toledo-Ronen O. (2013). Audio Event Classification Using Deep Neural Networks. Proc. Interspeech 2013, 1482–1486. DOI: 10.21437/Interspeech.2013-384.
dc.relation.referencesen	17. Lee H., Grosse R., Ranganath R., & Ng A.Y. (2011). Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks. Communications of the ACM, Vol. 54, No. 10, 95–103. DOI: 10.1145/2001269.2001295.
dc.relation.referencesen	18. Gartzman D. Getting to Know the Mel Spectrogram. Towards Data Science. August, 2019. Retrieved from: https://towardsdatascience.com/getting-to-know-the-mel-spectrogram-31bca3e2d9d0 (date of access: 20.09.2023)
dc.relation.referencesen	19. Papia Nandi – CNNs for audio classification. A primer in deep learning for audio classification using TensorFlow. Towards Data Science. Murch, 2021. Retrieved from: https://towardsdatascience.com/cnns-for-audio-classification-6244954665ab (date of access: 16.09.2023)
dc.relation.referencesen	20. Chollet, F. Deep Learning with Python (2018), v. 361, New York: Manning.
dc.relation.referencesen	21. SHANTAMVIJAYPUTRA - Bird Voice Detection Dataset. Retrieved from: https://www.kaggle.com/datasets/vshantam/bird-voice-detection (date of access: 15.05.2023)
dc.relation.uri	https://towardsdatascience.com/getting-to-know-the-mel-spectrogram-31bca3e2d9d0
dc.relation.uri	https://towardsdatascience.com/cnns-for-audio-classification-6244954665ab
dc.relation.uri	https://www.kaggle.com/datasets/vshantam/bird-voice-detection
dc.rights.holder	© Національний університет “Львівська політехніка”, 2023
dc.rights.holder	© Гонсьор О. Й., Гонсьор Ю. І., 2023
dc.subject	машинне навчання
dc.subject	ідентифікація звуку
dc.subject	спектрограма
dc.subject	згорткова нейронна мережа
dc.subject	machine learning
dc.subject	sound identification
dc.subject	spectrogram
dc.subject	convolutional neural network
dc.subject.udc	004.9
dc.title	Ідентифікація звуку голосів птахів за допомогою згорткових нейронних мереж з використанням STFT та MEL спектрограм
dc.title.alternative	Identification of birds’ voices using convolutional neural networks based on STFT and MEL spectrogram
dc.type	Article

Files

Original bundle

Now showing 1 - 2 of 2

Name:: 2023n14_Honsor_O-Identification_of_birds_297-311.pdf
Size:: 10.47 MB
Format:: Adobe Portable Document Format

Download

Name:: 2023n14_Honsor_O-Identification_of_birds_297-311__COVER.png
Size:: 380.91 KB
Format:: Portable Network Graphics

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.81 KB
Format:: Plain Text
Description:

Download

Collections

Вісник Національного університету "Львівська політехніка". Інформаційні системи та мережі. – 2023. – Випуск 14