Development of the multimodal handling interface based on GOOGLE API

Басистюк, Олег; Мельникова, Наталія; Basystiuk, Oleh; Melnykova, Nataliia

Development of the multimodal handling interface based on GOOGLE API

dc.citation.epage	223
dc.citation.issue	1
dc.citation.journalTitle	Комп’ютерні системи проектування. Теорія і практика
dc.citation.spage	216
dc.contributor.affiliation	Національний університет “Львівська політехніка”
dc.contributor.affiliation	Lviv Polytechnic National University
dc.contributor.author	Басистюк, Олег
dc.contributor.author	Мельникова, Наталія
dc.contributor.author	Basystiuk, Oleh
dc.contributor.author	Melnykova, Nataliia
dc.coverage.placename	Львів
dc.coverage.placename	Lviv
dc.date.accessioned	2025-03-11T09:52:36Z
dc.date.created	2024-02-27
dc.date.issued	2024-02-27
dc.description.abstract	Сьогодні штучний інтелект - це повсякденна рутина, яка глибоко увійшла в наше життя. Однією з найпопулярніших технологій, що швидко розвивається, є розпізнавання мовлення, яке є невід'ємною частиною ширшої концепції обробки мультимодальних даних. Мультимодальні дані охоплюють голос, аудіо та текстові дані, що є багатогранним підходом до розуміння та обробки інформації. У цій статті представлено розробку інтерфейсу для роботи з мультимодальними даними з використанням технологій Google API. Інтерфейс має на меті полегшити безперешкодну інтеграцію та управління різними форматами даних, включаючи текст, аудіо та відео, в рамках єдиної платформи. Завдяки використанню функцій Google API, таких як обробка природної мови, розпізнавання мови та аналіз відео, інтерфейс пропонує розширені можливості для обробки, аналізу та інтерпретації мультимодальних даних. У статті обговорюється дизайн і реалізація інтерфейсу, висвітлюються його особливості та функціональні можливості. Крім того, досліджуються потенційні застосування та майбутні напрямки використання інтерфейсу в різних сферах, включаючи охорону здоров'я, освіту та створення мультимедійного контенту. Загалом, розробка інтерфейсу для обробки мультимодальних даних на основі Google API є значним кроком на шляху до вдосконалення обробки мультимодальних даних та покращення користувацького досвіду взаємодії з різними джерелами даних.
dc.description.abstract	Today, Artificial Intelligence is a daily routine, becoming deeply entrenched in our lives. One of the most popular and rapidly advancing technologies is speech recognition, which forms an integral part of the broader concept of multimodal data handling. Multimodal data encompasses voice, audio, and text data, constituting a multifaceted approach to understanding and processing information. This paper presents the development of a multimodal handling interface leveraging Google API technologies. The interface aims to facilitate seamless integration and management of diverse data modalities, including text, audio, and video, within a unified platform. Through the utilization of Google API functionalities, such as natural language processing, speech recognition, and video analysis, the interface offers enhanced capabilities for processing, analysing, and interpreting multimodal data. The paper discusses the design and implementation of the interface, highlighting its features and functionalities. Furthermore, it explores potential applications and future directions for utilizing the interface in various domains, including healthcare, education, and multimedia content creation. Overall, the development of the multimodal handling interface based on Google API represents a significant step towards advancing multimodal data processing and enhancing user experience in interacting with diverse data sources.
dc.format.extent	216-223
dc.format.pages	8
dc.identifier.citation	Basystiuk O. Development of the multimodal handling interface based on GOOGLE API / Oleh Basystiuk, Nataliia Melnykova // Computer Systems of Design. Theory and Practice. — Lviv : Lviv Politechnic Publishing House, 2024. — Vol 6. — No 1. — P. 216–223.
dc.identifier.citationen	Basystiuk O. Development of the multimodal handling interface based on GOOGLE API / Oleh Basystiuk, Nataliia Melnykova // Computer Systems of Design. Theory and Practice. — Lviv : Lviv Politechnic Publishing House, 2024. — Vol 6. — No 1. — P. 216–223.
dc.identifier.doi	doi.org/10.23939/cds2024.01.216
dc.identifier.uri	https://ena.lpnu.ua/handle/ntb/64114
dc.language.iso	en
dc.publisher	Видавництво Львівської політехніки
dc.publisher	Lviv Politechnic Publishing House
dc.relation.ispartof	Комп’ютерні системи проектування. Теорія і практика, 1 (6), 2024
dc.relation.ispartof	Computer Systems of Design. Theory and Practice, 1 (6), 2024
dc.relation.references	[1] Karpathy and L. Fei-Fei, “Deep visual-semantic alignmentsfor generating image descriptions,” in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR), 2015, pp. 3128–3137 https://doi.org/10.1109/CVPR.2015.7298932
dc.relation.references	[2] Daxin Tan, Liqun Deng, Yu Ting Yeung, Xin Jiang, Xiao Chen,and Tan Lee, “Editspeech: A text based speech editing systemusing partial inference and bidirectional fusion,” arXiv preprintarXiv:2107.01554, 2021. https://doi.org/10.1109/ASRU51503.2021.9688051
dc.relation.references	[3] M. Oncescu, A. S. Koepke, J. F. Henriques, Z. Akata, andS. Albanie, “Audio Retrieval with Natural Language Queries,”in Proceedings of Conference of the International Speech Com-munication Association, 2021, pp. 2411–2415. https://doi.org/10.21437/Interspeech.2021-2227
dc.relation.references	[4] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and YoshuaBengio, Deep learning, vol. 1, MIT press Cambridge, 2016
dc.relation.references	[5] Ivan Izonin, et. al., "The Combined Use of the Wiener Polynomial and SVM for Material Classification Task in Medical Implants Production", International Journal of Intelligent Systems and Applications (IJISA), Vol.10, No.9, pp.40-47, 2018. https://doi.org/10.5815/ijisa.2018.09.05
dc.relation.references	[6] Havryliuk, M., Dumyn, I., Vovk, O. (2023). Extraction of Structural Elements of the Text Using Pragmatic Features for the Nomenclature of Cases Verification. In: Hu, Z., Wang, Y., He, M. (eds) Advances in Intelligent Systems, Computer Science and Digital Economics IV. CSDEIS 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 158. Springer, Cham. https://doi.org/10.1007/978-3-031-24475-9_57
dc.relation.references	[7] Vitaly Yakovyna, Natalya Shakhovska, "Software failure time series prediction with RBF, GRNN, and LSTM neural networks", Procedia Computer Science 207(4):837-847, https://doi.org/10.1016/j.procs.2022.09.139
dc.relation.references	[8] Nataliya Shakhovska, et. al.: "The Developing of the System for Autimatic Audio to Text Conversion", IT&AS’2021: Symposium on Information Technologies and Applied Sciences, March 5–6, 2021, Bratislava, Slovak Republic.
dc.relation.references	[9] uxuan Wang, Daisy Stanton, Yu Zhang, RJ-Skerry Ryan, EricBattenberg, Joel Shor, Ying Xiao, Ye Jia, Fei Ren, and Rif ASaurous, “Style tokens: Unsupervised style modeling, controland transfer in end-to-end speech synthesis,” in InternationalConference on Machine Learning. PMLR, 2018, pp. 5180–5189.
dc.relation.references	[10] Nataliya Boyko, et. al.: "Usage of Machine-based Translation Methods for Analyzing Open Data in Legal Cases". In: Proc. of the CybHyg-2019, Kyiv, Ukraine, November 30, 2019, pp. 328–338. CEUR-WS.org.
dc.relation.references	[11] Berezsky O., Verbovyy S., Pitsun O. Hybrid Intelligent information techology for biomedical image processing. Proceedings of the IEEE International Conference «Computer Science and Information Technologies» CSIT’2018, Lviv. Ukraine, 11-14 September, 2018. Р. 420-423. їhttps://doi.org/10.1109/STC-CSIT.2018.8526711
dc.relation.references	[12] Zoryana Rybchak, et. al. "Analysis of methods and means of text mining". ECONTECHMOD, 6(2), 2017, pp. 73-78.
dc.relation.references	[13] P. Zdebskyi, V. Lytvyn,Y. Burov, and et. Intelligent system for semantically similar sentences identification and generation based on machine learning methods, CEUR Workshop Proceedings, 2020, pp. 317–346.
dc.relation.references	[14] Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, and MingLiu, “Neural speech synthesis with transformer network,” inProceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 6706–6713. https://doi.org/10.1609/aaai.v33i01.33016706
dc.relation.references	[15] Oleh Basystiuk, Nataliia Melnykova "Multimodal Approaches for Natural Language Processing in Medical Data" Proceedings of the 5th International Conference on Informatics & Data-Driven Medicine, Lyon, France, November 18 - 20, CEUR-WS.org, 2022. pp. 246-252
dc.relation.references	[16] N. Shakhovska, N. Boyko, P. Pukach. The Information Model of Cloud Data Warehouses International Conference on Computer Science and Information Technologies, CSIT 2018, September 11-14, Lviv, Ukraine, 2019, pp. 182-191. https://doi.org/10.1007/978-3-030-01069-0_13
dc.relation.references	[17] ifa Sun, Kun Li, Hao Wang, Shiyin Kang, and Helen Meng,“Phonetic posteriorgrams for many-to-one voice conversionwithout parallel data training,” in 2016 IEEE InternationalConference on Multimedia and Expo (ICME). IEEE, 2016, pp.1–6. https://doi.org/10.1109/ICME.2016.7552917
dc.relation.references	[18] S. Chowdhury and J. Sil, "FACERECOGNITION from NON-FRONTALIMAGES Using DEEP NEURALNETWORK," in 2017 Ninth InternationalConference on Advances in PatternRecognition (ICAPR), 2017, pp. 1-6. https://doi.org/10.1109/ICAPR.2017.8593160
dc.relation.references	[19] Z. Rybchak, O. Basystiuk, Analysis of computer vision and image analysis technics, ECONTECHMOD: an international quarterly journal on economics of technology and modelling processes, Lublin, Poland, 2017, pp. 79-84.
dc.relation.references	[20] I. Zheliznyak, Z. Rybchak, I. Zavuschak, Analysis of clustering algorithms, 2017. Advances in Intelligent Systems and Computing, 2017, pp. 305–314. https://doi.org/10.1007/978-3-319-45991-2_21
dc.relation.referencesen	[1] Karpathy and L. Fei-Fei, "Deep visual-semantic alignmentsfor generating image descriptions," in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR), 2015, pp. 3128–3137 https://doi.org/10.1109/CVPR.2015.7298932
dc.relation.referencesen	[2] Daxin Tan, Liqun Deng, Yu Ting Yeung, Xin Jiang, Xiao Chen,and Tan Lee, "Editspeech: A text based speech editing systemusing partial inference and bidirectional fusion," arXiv preprintarXiv:2107.01554, 2021. https://doi.org/10.1109/ASRU51503.2021.9688051
dc.relation.referencesen	[3] M. Oncescu, A. S. Koepke, J. F. Henriques, Z. Akata, andS. Albanie, "Audio Retrieval with Natural Language Queries,"in Proceedings of Conference of the International Speech Com-munication Association, 2021, pp. 2411–2415. https://doi.org/10.21437/Interspeech.2021-2227
dc.relation.referencesen	[4] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and YoshuaBengio, Deep learning, vol. 1, MIT press Cambridge, 2016
dc.relation.referencesen	[5] Ivan Izonin, et. al., "The Combined Use of the Wiener Polynomial and SVM for Material Classification Task in Medical Implants Production", International Journal of Intelligent Systems and Applications (IJISA), Vol.10, No.9, pp.40-47, 2018. https://doi.org/10.5815/ijisa.2018.09.05
dc.relation.referencesen	[6] Havryliuk, M., Dumyn, I., Vovk, O. (2023). Extraction of Structural Elements of the Text Using Pragmatic Features for the Nomenclature of Cases Verification. In: Hu, Z., Wang, Y., He, M. (eds) Advances in Intelligent Systems, Computer Science and Digital Economics IV. CSDEIS 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 158. Springer, Cham. https://doi.org/10.1007/978-3-031-24475-9_57
dc.relation.referencesen	[7] Vitaly Yakovyna, Natalya Shakhovska, "Software failure time series prediction with RBF, GRNN, and LSTM neural networks", Procedia Computer Science 207(4):837-847, https://doi.org/10.1016/j.procs.2022.09.139
dc.relation.referencesen	[8] Nataliya Shakhovska, et. al., "The Developing of the System for Autimatic Audio to Text Conversion", IT&AS’2021: Symposium on Information Technologies and Applied Sciences, March 5–6, 2021, Bratislava, Slovak Republic.
dc.relation.referencesen	[9] uxuan Wang, Daisy Stanton, Yu Zhang, RJ-Skerry Ryan, EricBattenberg, Joel Shor, Ying Xiao, Ye Jia, Fei Ren, and Rif ASaurous, "Style tokens: Unsupervised style modeling, controland transfer in end-to-end speech synthesis," in InternationalConference on Machine Learning. PMLR, 2018, pp. 5180–5189.
dc.relation.referencesen	[10] Nataliya Boyko, et. al., "Usage of Machine-based Translation Methods for Analyzing Open Data in Legal Cases". In: Proc. of the CybHyg-2019, Kyiv, Ukraine, November 30, 2019, pp. 328–338. CEUR-WS.org.
dc.relation.referencesen	[11] Berezsky O., Verbovyy S., Pitsun O. Hybrid Intelligent information techology for biomedical image processing. Proceedings of the IEEE International Conference "Computer Science and Information Technologies" CSIT2018, Lviv. Ukraine, 11-14 September, 2018. R. 420-423. yihttps://doi.org/10.1109/STC-CSIT.2018.8526711
dc.relation.referencesen	[12] Zoryana Rybchak, et. al. "Analysis of methods and means of text mining". ECONTECHMOD, 6(2), 2017, pp. 73-78.
dc.relation.referencesen	[13] P. Zdebskyi, V. Lytvyn,Y. Burov, and et. Intelligent system for semantically similar sentences identification and generation based on machine learning methods, CEUR Workshop Proceedings, 2020, pp. 317–346.
dc.relation.referencesen	[14] Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, and MingLiu, "Neural speech synthesis with transformer network," inProceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 6706–6713. https://doi.org/10.1609/aaai.v33i01.33016706
dc.relation.referencesen	[15] Oleh Basystiuk, Nataliia Melnykova "Multimodal Approaches for Natural Language Processing in Medical Data" Proceedings of the 5th International Conference on Informatics & Data-Driven Medicine, Lyon, France, November 18 - 20, CEUR-WS.org, 2022. pp. 246-252
dc.relation.referencesen	[16] N. Shakhovska, N. Boyko, P. Pukach. The Information Model of Cloud Data Warehouses International Conference on Computer Science and Information Technologies, CSIT 2018, September 11-14, Lviv, Ukraine, 2019, pp. 182-191. https://doi.org/10.1007/978-3-030-01069-0_13
dc.relation.referencesen	[17] ifa Sun, Kun Li, Hao Wang, Shiyin Kang, and Helen Meng,"Phonetic posteriorgrams for many-to-one voice conversionwithout parallel data training," in 2016 IEEE InternationalConference on Multimedia and Expo (ICME). IEEE, 2016, pp.1–6. https://doi.org/10.1109/ICME.2016.7552917
dc.relation.referencesen	[18] S. Chowdhury and J. Sil, "FACERECOGNITION from NON-FRONTALIMAGES Using DEEP NEURALNETWORK," in 2017 Ninth InternationalConference on Advances in PatternRecognition (ICAPR), 2017, pp. 1-6. https://doi.org/10.1109/ICAPR.2017.8593160
dc.relation.referencesen	[19] Z. Rybchak, O. Basystiuk, Analysis of computer vision and image analysis technics, ECONTECHMOD: an international quarterly journal on economics of technology and modelling processes, Lublin, Poland, 2017, pp. 79-84.
dc.relation.referencesen	[20] I. Zheliznyak, Z. Rybchak, I. Zavuschak, Analysis of clustering algorithms, 2017. Advances in Intelligent Systems and Computing, 2017, pp. 305–314. https://doi.org/10.1007/978-3-319-45991-2_21
dc.relation.uri	https://doi.org/10.1109/CVPR.2015.7298932
dc.relation.uri	https://doi.org/10.1109/ASRU51503.2021.9688051
dc.relation.uri	https://doi.org/10.21437/Interspeech.2021-2227
dc.relation.uri	https://doi.org/10.5815/ijisa.2018.09.05
dc.relation.uri	https://doi.org/10.1007/978-3-031-24475-9_57
dc.relation.uri	https://doi.org/10.1016/j.procs.2022.09.139
dc.relation.uri	https://doi.org/10.1109/STC-CSIT.2018.8526711
dc.relation.uri	https://doi.org/10.1609/aaai.v33i01.33016706
dc.relation.uri	https://doi.org/10.1007/978-3-030-01069-0_13
dc.relation.uri	https://doi.org/10.1109/ICME.2016.7552917
dc.relation.uri	https://doi.org/10.1109/ICAPR.2017.8593160
dc.relation.uri	https://doi.org/10.1007/978-3-319-45991-2_21
dc.rights.holder	© Національний університет “Львівська політехніка”, 2024
dc.rights.holder	© Basystiuk O., Melnykova N., 2024
dc.subject	перетворення мови в текст
dc.subject	розпізнавання мови
dc.subject	sequence-to-sequence
dc.subject	машинне навчання
dc.subject	штучний інтелект
dc.subject	Speech-to-Text
dc.subject	speech recognition
dc.subject	sequence-to-sequence
dc.subject	machine learning
dc.subject	artificial intelligence
dc.title	Development of the multimodal handling interface based on GOOGLE API
dc.title.alternative	Розробка інтерфейсу обробки мультимодальних даних на основі GOOGLE API
dc.type	Article

Files

Original bundle

Now showing 1 - 2 of 2

Name:: 2024v6n1_Basystiuk_O-Development_of_the_multimodal_216-223.pdf
Size:: 303.73 KB
Format:: Adobe Portable Document Format

Download

Name:: 2024v6n1_Basystiuk_O-Development_of_the_multimodal_216-223__COVER.png
Size:: 452.28 KB
Format:: Portable Network Graphics

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.82 KB
Format:: Plain Text
Description:

Download

Collections

Комп'ютерні системи проектування теорія і практика. – 2024. – Том 6, № 1