Дослідження моделей для розпізнавання жестів з використанням 3D конволюційних нейронних мереж та візуальних трансформерів

Чорненький, В. Я.; Казимира, І. Я.; Chornenkyi, V. Ya.; Kazymyra, I. Ya.

Дослідження моделей для розпізнавання жестів з використанням 3D конволюційних нейронних мереж та візуальних трансформерів

dc.citation.epage	40
dc.citation.issue	2
dc.citation.journalTitle	Український журнал інформаційних технологій
dc.citation.spage	33
dc.citation.volume	5
dc.contributor.affiliation	Національний університет “Львівська політехніка”
dc.contributor.affiliation	Lviv Polytechnic National University
dc.contributor.author	Чорненький, В. Я.
dc.contributor.author	Казимира, І. Я.
dc.contributor.author	Chornenkyi, V. Ya.
dc.contributor.author	Kazymyra, I. Ya.
dc.coverage.placename	Львів
dc.coverage.placename	Lviv
dc.date.accessioned	2024-04-01T11:06:08Z
dc.date.available	2024-04-01T11:06:08Z
dc.date.created	2023-02-28
dc.date.issued	2023-02-28
dc.description.abstract	У роботі розглядається актуальне завдання розпізнавання жестів з метою реформування способів до навчання військових, способів комунікації людини та машини та вдосконалення взаємодії людини-людини та людини-машини для осіб з обмеженими можливостями. Проаналізовано методи для розпізнавання жестів руки на основі компʼютерного зору, а також з використанням глибокого навчання. Описано принципи роботи моделей з використанням 3D конволюційних нейронних мереж та трансформерів, наведено їх структурні схеми та проаналізовано особливості функціонування складових. У межах 3D-CNN архітектури розглянуто конволюційну нейронну мережу з двома конволюційними шарами та двома шарами групування. Кожна 3D згортка отримується шляхом згортки ядра 3D-фільтра і складання декількох суміжних кадрів разом для отримання 3D-куба. У межах ViT архітектури розглянуто візуальний трансформер з Linear Projection, Transformer Encoder, двома підшарами: шар Multi-head SelfAttention (MSA) та шаром прямого поширення, також відомим як Multi-Layer Perceptron (MLP). На підставі досліджених архітектур проведено навчання моделей з використанням ASL та NUS-II наборів даних та розглянуто їх ефективність після 20 навчальних епох на основі показників відтворення, точності та F1-оцінки. Визначено вплив тривалості навчання на ефективність моделі з використанням ViT архітектури після 20 та 40 навчальних епох. Продемонстровано, в яких ситуаціях 3D конволюційні нейронні мережі та візуальні трансформери показують кращі результати точності, та обмеження, притаманні кожному підходу в умовах варіативності середовища та обчислювальних потужностей. Отримали подальший розвиток інноваційні архітектури для розпізнавання жестів руки з використанням глибокого навчання для майбутніх досліджень та реалізацій у програмних продуктах.
dc.description.abstract	The work primarily focuses on addressing the contemporary challenge of hand gesture recognition, driven by the overarching objectives of revolutionizing military training methodologies, enhancing human-machine interactions, and facilitating improved communication between individuals with disabilities and machines. In-depth scrutiny of the methods for hand gesture recognition involves a comprehensive analysis, encompassing both established historical computer vision approaches and the latest deep learning trends available in the present day. This investigation delves into the fundamental principles that underpin the design of models utilizing 3D convolutional neural networks and visual transformers. Within the 3D-CNN architecture that was analyzed, a convolutional neural network with two convolutional layers and two pooling layers is considered. Each 3D convolution is obtained by convolving a 3D filter kernel and summing multiple adjacent frames to create a 3D cube. The visual transformer architecture that is consisting of a visual transformer with Linear Projection, a Transformer Encoder, and two sub-layers: the Multi-head Self-Attention (MSA) layer and the feedforward layer, also known as the Multi-Layer Perceptron (MLP), is considered. This research endeavors to push the boundaries of hand gesture recognition by deploying models trained on the ASL and NUS-II datasets, which encompass a diverse array of sign language images. The performance of these models is assessed after 20 training epochs, drawing insights from various performance metrics, including recall, precision, and the F1 score. Additionally, the study investigates the impact on model performance when adopting the ViT architecture after both 20 and 40 training epochs were performed. This analysis unveils the scenarios in which 3D convolutional neural networks and visual transformers achieve superior accuracy results. Simultaneously, it sheds light on the inherent constraints that accompany each approach within the ever-evolving landscape of environmental variables and computational resources. The research identifies cutting-edge architectural paradigms for hand gesture recognition, rooted in deep learning, which hold immense promise for further exploration and eventual implementation and integration into software products.
dc.format.extent	33-40
dc.format.pages	8
dc.identifier.citation	Чорненький В. Я. Дослідження моделей для розпізнавання жестів з використанням 3D конволюційних нейронних мереж та візуальних трансформерів / В. Я. Чорненький, І. Я. Казимира // Український журнал інформаційних технологій. — Львів : Видавництво Львівської політехніки, 2023. — Том 5. — № 2. — С. 33–40.
dc.identifier.citationen	Chornenkyi V. Ya. Research of the models for sign gesture recognition using 3D convolutional neural networks and visual transformers / V. Ya. Chornenkyi, I. Ya. Kazymyra // Ukrainian Journal of Information Technology. — Lviv : Lviv Politechnic Publishing House, 2023. — Vol 5. — No 2. — P. 33–40.
dc.identifier.doi	doi.org/10.23939/ujit2023.02.033
dc.identifier.issn	2707-1898
dc.identifier.uri	https://ena.lpnu.ua/handle/ntb/61602
dc.language.iso	uk
dc.publisher	Видавництво Львівської політехніки
dc.publisher	Lviv Politechnic Publishing House
dc.relation.ispartof	Український журнал інформаційних технологій, 2 (5), 2023
dc.relation.ispartof	Ukrainian Journal of Information Technology, 2 (5), 2023
dc.relation.references	[1] Molchanov, P., Gupta, S., Kim, K., & Kautz, J. (2015). Hand gesture recognition with 3D convolutional neural networks. https://doi.org/10.1109/CVPRW.2015.7301342
dc.relation.references	[2] Molchanov, P., Gupta, S., Kim, K., & Pulli, K. (2015). Multi-sensor system for driver's hand-gesture recognition. 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 1, 1-8. https://doi.org/10.1109/FG.2015.7163132
dc.relation.references	[3] Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. 2014 IEEE Conference on Computer Vision and Pattern Recognition, 223, 1725-1732. https://doi.org/10.1109/CVPR.2014.223
dc.relation.references	[4] Ohn-Bar, E., & Trivedi, M. M. (2014). Hand Gesture Recognition in Real Time for Automotive Interfaces: A Multimodal Vision-Based Approach and Evaluations. IEEE Transactions on Intelligent Transportation Systems, 15, 2368-2377. https://doi.org/10.1109/TITS.2014.2337331
dc.relation.references	[5] Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition. https://doi.org/10.48550/arXiv.1406.2199
dc.relation.references	[6] Tran, D., Bourdev, L. D., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3D convolutional networks. 2015 International Conference on Computer Vision, 9, 4489-4497. https://doi.org/10.1109/ICCV.2015.510
dc.relation.references	[7] Neverova, N., Wolf, C., Taylor, G. W., & Nebout, F. (2014). Multiscale deep learning for gesture detection and localization, 474-490. https://doi.org/10.1007/978-3-319-16178-5_33
dc.relation.references	[8] Yong, T., Kian, L., Connie, T., Chin-Poo, L., & Cheng-Yaw, L. (2021). Convolutional neural network with spatial pyramid pooling for hand gesture recognition. Neural Computing and Applications, 33, 1-13. https://doi.org/10.1007/s00521-020-05337-0
dc.relation.references	[9] Yong, T., Kian, L., & Chin-Poo, L. (2021). Hand Gesture Recognition via Enhanced Densely Connected Convolutional Neural Network. Expert Systems with Applications, 175. https://doi.org/10.1016/j.eswa.2021.114797
dc.relation.references	[10] Osimani, C.; Ojeda-Castelo, J. J.; & Piedra-Fernandez, J. A. (2023). Point Cloud Deep Learning Solution for Hand Gesture Recognition. International Journal of Interactive Multimedia and Artificial Intelligence. https://doi.org/10.9781/ijimai.2023.01.001
dc.relation.references	[11] Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics. https://doi.org/10.18653/v1 %2FN19-1423
dc.relation.references	[12] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners.
dc.relation.references	[13] Hengshuang, Z., Jiaya, J., & Vladlen, K. (2020). Exploring Self-Attention for Image Recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10073-10082. https://doi.org/10.1109/CVPR42600.2020.01009
dc.relation.references	[14] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-End Object Detection with Transformers. https://doi.org/10.1007/978-3-030-58452-8_13
dc.relation.references	[15] Ji, S. Xu, W., Yang, M., & Yu, K. (2010) 3 d convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, 35 (1), 495-502. https://doi.org/10.1109/TPAMI.2012.59
dc.relation.references	[16] Barczak, A. L. C., Reyes, N. H., Abastillas, M., Piccio, A., & Susnjak, T. A. (2011). New 2D Static Hand Gesture Colour Image Dataset for ASL Gestures.
dc.relation.references	[17] Pisharady, P. K., Vadakkepat, P., & Loh, A. P. (2013). Attention based detection and recognition of hand postures against complex backgrounds. International Journal of Computer Vision, 101, 403-419. https://doi.org/10.1007/s11263-012-0560-5
dc.relation.referencesen	[1] Molchanov, P., Gupta, S., Kim, K., & Kautz, J. (2015). Hand gesture recognition with 3D convolutional neural networks. https://doi.org/10.1109/CVPRW.2015.7301342
dc.relation.referencesen	[2] Molchanov, P., Gupta, S., Kim, K., & Pulli, K. (2015). Multi-sensor system for driver's hand-gesture recognition. 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 1, 1-8. https://doi.org/10.1109/FG.2015.7163132
dc.relation.referencesen	[3] Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. 2014 IEEE Conference on Computer Vision and Pattern Recognition, 223, 1725-1732. https://doi.org/10.1109/CVPR.2014.223
dc.relation.referencesen	[4] Ohn-Bar, E., & Trivedi, M. M. (2014). Hand Gesture Recognition in Real Time for Automotive Interfaces: A Multimodal Vision-Based Approach and Evaluations. IEEE Transactions on Intelligent Transportation Systems, 15, 2368-2377. https://doi.org/10.1109/TITS.2014.2337331
dc.relation.referencesen	[5] Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition. https://doi.org/10.48550/arXiv.1406.2199
dc.relation.referencesen	[6] Tran, D., Bourdev, L. D., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3D convolutional networks. 2015 International Conference on Computer Vision, 9, 4489-4497. https://doi.org/10.1109/ICCV.2015.510
dc.relation.referencesen	[7] Neverova, N., Wolf, C., Taylor, G. W., & Nebout, F. (2014). Multiscale deep learning for gesture detection and localization, 474-490. https://doi.org/10.1007/978-3-319-16178-5_33
dc.relation.referencesen	[8] Yong, T., Kian, L., Connie, T., Chin-Poo, L., & Cheng-Yaw, L. (2021). Convolutional neural network with spatial pyramid pooling for hand gesture recognition. Neural Computing and Applications, 33, 1-13. https://doi.org/10.1007/s00521-020-05337-0
dc.relation.referencesen	[9] Yong, T., Kian, L., & Chin-Poo, L. (2021). Hand Gesture Recognition via Enhanced Densely Connected Convolutional Neural Network. Expert Systems with Applications, 175. https://doi.org/10.1016/j.eswa.2021.114797
dc.relation.referencesen	[10] Osimani, C.; Ojeda-Castelo, J. J.; & Piedra-Fernandez, J. A. (2023). Point Cloud Deep Learning Solution for Hand Gesture Recognition. International Journal of Interactive Multimedia and Artificial Intelligence. https://doi.org/10.9781/ijimai.2023.01.001
dc.relation.referencesen	[11] Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics. https://doi.org/10.18653/v1 %2FN19-1423
dc.relation.referencesen	[12] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners.
dc.relation.referencesen	[13] Hengshuang, Z., Jiaya, J., & Vladlen, K. (2020). Exploring Self-Attention for Image Recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10073-10082. https://doi.org/10.1109/CVPR42600.2020.01009
dc.relation.referencesen	[14] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-End Object Detection with Transformers. https://doi.org/10.1007/978-3-030-58452-8_13
dc.relation.referencesen	[15] Ji, S. Xu, W., Yang, M., & Yu, K. (2010) 3 d convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, 35 (1), 495-502. https://doi.org/10.1109/TPAMI.2012.59
dc.relation.referencesen	[16] Barczak, A. L. C., Reyes, N. H., Abastillas, M., Piccio, A., & Susnjak, T. A. (2011). New 2D Static Hand Gesture Colour Image Dataset for ASL Gestures.
dc.relation.referencesen	[17] Pisharady, P. K., Vadakkepat, P., & Loh, A. P. (2013). Attention based detection and recognition of hand postures against complex backgrounds. International Journal of Computer Vision, 101, 403-419. https://doi.org/10.1007/s11263-012-0560-5
dc.relation.uri	https://doi.org/10.1109/CVPRW.2015.7301342
dc.relation.uri	https://doi.org/10.1109/FG.2015.7163132
dc.relation.uri	https://doi.org/10.1109/CVPR.2014.223
dc.relation.uri	https://doi.org/10.1109/TITS.2014.2337331
dc.relation.uri	https://doi.org/10.48550/arXiv.1406.2199
dc.relation.uri	https://doi.org/10.1109/ICCV.2015.510
dc.relation.uri	https://doi.org/10.1007/978-3-319-16178-5_33
dc.relation.uri	https://doi.org/10.1007/s00521-020-05337-0
dc.relation.uri	https://doi.org/10.1016/j.eswa.2021.114797
dc.relation.uri	https://doi.org/10.9781/ijimai.2023.01.001
dc.relation.uri	https://doi.org/10.18653/v1
dc.relation.uri	https://doi.org/10.1109/CVPR42600.2020.01009
dc.relation.uri	https://doi.org/10.1007/978-3-030-58452-8_13
dc.relation.uri	https://doi.org/10.1109/TPAMI.2012.59
dc.relation.uri	https://doi.org/10.1007/s11263-012-0560-5
dc.rights.holder	© Національний університет “Львівська політехніка”, 2023
dc.subject	глибоке навчання
dc.subject	взаємодія людини та машини
dc.subject	ефективність нейронних мереж
dc.subject	набори даних для мови жестів
dc.subject	deep learning
dc.subject	human-machine interactions
dc.subject	neural networks performance
dc.subject	sign language datasets
dc.subject.udc	004.93
dc.title	Дослідження моделей для розпізнавання жестів з використанням 3D конволюційних нейронних мереж та візуальних трансформерів
dc.title.alternative	Research of the models for sign gesture recognition using 3D convolutional neural networks and visual transformers
dc.type	Article

Files

Original bundle

Now showing 1 - 2 of 2

Name:: 2023v5n2_Chornenkyi_V_Ya-Research_of_the_models_33-40.pdf
Size:: 1.58 MB
Format:: Adobe Portable Document Format

Download

Name:: 2023v5n2_Chornenkyi_V_Ya-Research_of_the_models_33-40__COVER.png
Size:: 1.69 MB
Format:: Portable Network Graphics

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.82 KB
Format:: Plain Text
Description:

Download

Collections

Ukrainian Journal of Information Technology. – 2023. – Vol. 5, No. 2