Development of the multimodal handling interface based on GOOGLE API

dc.citation.epage223
dc.citation.issue1
dc.citation.journalTitleКомп’ютерні системи проектування. Теорія і практика
dc.citation.spage216
dc.contributor.affiliationНаціональний університет “Львівська політехніка”
dc.contributor.affiliationLviv Polytechnic National University
dc.contributor.authorБасистюк, Олег
dc.contributor.authorМельникова, Наталія
dc.contributor.authorBasystiuk, Oleh
dc.contributor.authorMelnykova, Nataliia
dc.coverage.placenameЛьвів
dc.coverage.placenameLviv
dc.date.accessioned2025-03-11T09:52:36Z
dc.date.created2024-02-27
dc.date.issued2024-02-27
dc.description.abstractСьогодні штучний інтелект - це повсякденна рутина, яка глибоко увійшла в наше життя. Однією з найпопулярніших технологій, що швидко розвивається, є розпізнавання мовлення, яке є невід'ємною частиною ширшої концепції обробки мультимодальних даних. Мультимодальні дані охоплюють голос, аудіо та текстові дані, що є багатогранним підходом до розуміння та обробки інформації. У цій статті представлено розробку інтерфейсу для роботи з мультимодальними даними з використанням технологій Google API. Інтерфейс має на меті полегшити безперешкодну інтеграцію та управління різними форматами даних, включаючи текст, аудіо та відео, в рамках єдиної платформи. Завдяки використанню функцій Google API, таких як обробка природної мови, розпізнавання мови та аналіз відео, інтерфейс пропонує розширені можливості для обробки, аналізу та інтерпретації мультимодальних даних. У статті обговорюється дизайн і реалізація інтерфейсу, висвітлюються його особливості та функціональні можливості. Крім того, досліджуються потенційні застосування та майбутні напрямки використання інтерфейсу в різних сферах, включаючи охорону здоров'я, освіту та створення мультимедійного контенту. Загалом, розробка інтерфейсу для обробки мультимодальних даних на основі Google API є значним кроком на шляху до вдосконалення обробки мультимодальних даних та покращення користувацького досвіду взаємодії з різними джерелами даних.
dc.description.abstractToday, Artificial Intelligence is a daily routine, becoming deeply entrenched in our lives. One of the most popular and rapidly advancing technologies is speech recognition, which forms an integral part of the broader concept of multimodal data handling. Multimodal data encompasses voice, audio, and text data, constituting a multifaceted approach to understanding and processing information. This paper presents the development of a multimodal handling interface leveraging Google API technologies. The interface aims to facilitate seamless integration and management of diverse data modalities, including text, audio, and video, within a unified platform. Through the utilization of Google API functionalities, such as natural language processing, speech recognition, and video analysis, the interface offers enhanced capabilities for processing, analysing, and interpreting multimodal data. The paper discusses the design and implementation of the interface, highlighting its features and functionalities. Furthermore, it explores potential applications and future directions for utilizing the interface in various domains, including healthcare, education, and multimedia content creation. Overall, the development of the multimodal handling interface based on Google API represents a significant step towards advancing multimodal data processing and enhancing user experience in interacting with diverse data sources.
dc.format.extent216-223
dc.format.pages8
dc.identifier.citationBasystiuk O. Development of the multimodal handling interface based on GOOGLE API / Oleh Basystiuk, Nataliia Melnykova // Computer Systems of Design. Theory and Practice. — Lviv : Lviv Politechnic Publishing House, 2024. — Vol 6. — No 1. — P. 216–223.
dc.identifier.citationenBasystiuk O. Development of the multimodal handling interface based on GOOGLE API / Oleh Basystiuk, Nataliia Melnykova // Computer Systems of Design. Theory and Practice. — Lviv : Lviv Politechnic Publishing House, 2024. — Vol 6. — No 1. — P. 216–223.
dc.identifier.doidoi.org/10.23939/cds2024.01.216
dc.identifier.urihttps://ena.lpnu.ua/handle/ntb/64114
dc.language.isoen
dc.publisherВидавництво Львівської політехніки
dc.publisherLviv Politechnic Publishing House
dc.relation.ispartofКомп’ютерні системи проектування. Теорія і практика, 1 (6), 2024
dc.relation.ispartofComputer Systems of Design. Theory and Practice, 1 (6), 2024
dc.relation.references[1] Karpathy and L. Fei-Fei, “Deep visual-semantic alignmentsfor generating image descriptions,” in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR), 2015, pp. 3128–3137 https://doi.org/10.1109/CVPR.2015.7298932
dc.relation.references[2] Daxin Tan, Liqun Deng, Yu Ting Yeung, Xin Jiang, Xiao Chen,and Tan Lee, “Editspeech: A text based speech editing systemusing partial inference and bidirectional fusion,” arXiv preprintarXiv:2107.01554, 2021. https://doi.org/10.1109/ASRU51503.2021.9688051
dc.relation.references[3] M. Oncescu, A. S. Koepke, J. F. Henriques, Z. Akata, andS. Albanie, “Audio Retrieval with Natural Language Queries,”in Proceedings of Conference of the International Speech Com-munication Association, 2021, pp. 2411–2415. https://doi.org/10.21437/Interspeech.2021-2227
dc.relation.references[4] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and YoshuaBengio, Deep learning, vol. 1, MIT press Cambridge, 2016
dc.relation.references[5] Ivan Izonin, et. al., "The Combined Use of the Wiener Polynomial and SVM for Material Classification Task in Medical Implants Production", International Journal of Intelligent Systems and Applications (IJISA), Vol.10, No.9, pp.40-47, 2018. https://doi.org/10.5815/ijisa.2018.09.05
dc.relation.references[6] Havryliuk, M., Dumyn, I., Vovk, O. (2023). Extraction of Structural Elements of the Text Using Pragmatic Features for the Nomenclature of Cases Verification. In: Hu, Z., Wang, Y., He, M. (eds) Advances in Intelligent Systems, Computer Science and Digital Economics IV. CSDEIS 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 158. Springer, Cham. https://doi.org/10.1007/978-3-031-24475-9_57
dc.relation.references[7] Vitaly Yakovyna, Natalya Shakhovska, "Software failure time series prediction with RBF, GRNN, and LSTM neural networks", Procedia Computer Science 207(4):837-847, https://doi.org/10.1016/j.procs.2022.09.139
dc.relation.references[8] Nataliya Shakhovska, et. al.: "The Developing of the System for Autimatic Audio to Text Conversion", IT&AS’2021: Symposium on Information Technologies and Applied Sciences, March 5–6, 2021, Bratislava, Slovak Republic.
dc.relation.references[9] uxuan Wang, Daisy Stanton, Yu Zhang, RJ-Skerry Ryan, EricBattenberg, Joel Shor, Ying Xiao, Ye Jia, Fei Ren, and Rif ASaurous, “Style tokens: Unsupervised style modeling, controland transfer in end-to-end speech synthesis,” in InternationalConference on Machine Learning. PMLR, 2018, pp. 5180–5189.
dc.relation.references[10] Nataliya Boyko, et. al.: "Usage of Machine-based Translation Methods for Analyzing Open Data in Legal Cases". In: Proc. of the CybHyg-2019, Kyiv, Ukraine, November 30, 2019, pp. 328–338. CEUR-WS.org.
dc.relation.references[11] Berezsky O., Verbovyy S., Pitsun O. Hybrid Intelligent information techology for biomedical image processing. Proceedings of the IEEE International Conference «Computer Science and Information Technologies» CSIT’2018, Lviv. Ukraine, 11-14 September, 2018. Р. 420-423. їhttps://doi.org/10.1109/STC-CSIT.2018.8526711
dc.relation.references[12] Zoryana Rybchak, et. al. "Analysis of methods and means of text mining". ECONTECHMOD, 6(2), 2017, pp. 73-78.
dc.relation.references[13] P. Zdebskyi, V. Lytvyn,Y. Burov, and et. Intelligent system for semantically similar sentences identification and generation based on machine learning methods, CEUR Workshop Proceedings, 2020, pp. 317–346.
dc.relation.references[14] Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, and MingLiu, “Neural speech synthesis with transformer network,” inProceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 6706–6713. https://doi.org/10.1609/aaai.v33i01.33016706
dc.relation.references[15] Oleh Basystiuk, Nataliia Melnykova "Multimodal Approaches for Natural Language Processing in Medical Data" Proceedings of the 5th International Conference on Informatics & Data-Driven Medicine, Lyon, France, November 18 - 20, CEUR-WS.org, 2022. pp. 246-252
dc.relation.references[16] N. Shakhovska, N. Boyko, P. Pukach. The Information Model of Cloud Data Warehouses International Conference on Computer Science and Information Technologies, CSIT 2018, September 11-14, Lviv, Ukraine, 2019, pp. 182-191. https://doi.org/10.1007/978-3-030-01069-0_13
dc.relation.references[17] ifa Sun, Kun Li, Hao Wang, Shiyin Kang, and Helen Meng,“Phonetic posteriorgrams for many-to-one voice conversionwithout parallel data training,” in 2016 IEEE InternationalConference on Multimedia and Expo (ICME). IEEE, 2016, pp.1–6. https://doi.org/10.1109/ICME.2016.7552917
dc.relation.references[18] S. Chowdhury and J. Sil, "FACERECOGNITION from NON-FRONTALIMAGES Using DEEP NEURALNETWORK," in 2017 Ninth InternationalConference on Advances in PatternRecognition (ICAPR), 2017, pp. 1-6. https://doi.org/10.1109/ICAPR.2017.8593160
dc.relation.references[19] Z. Rybchak, O. Basystiuk, Analysis of computer vision and image analysis technics, ECONTECHMOD: an international quarterly journal on economics of technology and modelling processes, Lublin, Poland, 2017, pp. 79-84.
dc.relation.references[20] I. Zheliznyak, Z. Rybchak, I. Zavuschak, Analysis of clustering algorithms, 2017. Advances in Intelligent Systems and Computing, 2017, pp. 305–314. https://doi.org/10.1007/978-3-319-45991-2_21
dc.relation.referencesen[1] Karpathy and L. Fei-Fei, "Deep visual-semantic alignmentsfor generating image descriptions," in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR), 2015, pp. 3128–3137 https://doi.org/10.1109/CVPR.2015.7298932
dc.relation.referencesen[2] Daxin Tan, Liqun Deng, Yu Ting Yeung, Xin Jiang, Xiao Chen,and Tan Lee, "Editspeech: A text based speech editing systemusing partial inference and bidirectional fusion," arXiv preprintarXiv:2107.01554, 2021. https://doi.org/10.1109/ASRU51503.2021.9688051
dc.relation.referencesen[3] M. Oncescu, A. S. Koepke, J. F. Henriques, Z. Akata, andS. Albanie, "Audio Retrieval with Natural Language Queries,"in Proceedings of Conference of the International Speech Com-munication Association, 2021, pp. 2411–2415. https://doi.org/10.21437/Interspeech.2021-2227
dc.relation.referencesen[4] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and YoshuaBengio, Deep learning, vol. 1, MIT press Cambridge, 2016
dc.relation.referencesen[5] Ivan Izonin, et. al., "The Combined Use of the Wiener Polynomial and SVM for Material Classification Task in Medical Implants Production", International Journal of Intelligent Systems and Applications (IJISA), Vol.10, No.9, pp.40-47, 2018. https://doi.org/10.5815/ijisa.2018.09.05
dc.relation.referencesen[6] Havryliuk, M., Dumyn, I., Vovk, O. (2023). Extraction of Structural Elements of the Text Using Pragmatic Features for the Nomenclature of Cases Verification. In: Hu, Z., Wang, Y., He, M. (eds) Advances in Intelligent Systems, Computer Science and Digital Economics IV. CSDEIS 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 158. Springer, Cham. https://doi.org/10.1007/978-3-031-24475-9_57
dc.relation.referencesen[7] Vitaly Yakovyna, Natalya Shakhovska, "Software failure time series prediction with RBF, GRNN, and LSTM neural networks", Procedia Computer Science 207(4):837-847, https://doi.org/10.1016/j.procs.2022.09.139
dc.relation.referencesen[8] Nataliya Shakhovska, et. al., "The Developing of the System for Autimatic Audio to Text Conversion", IT&AS’2021: Symposium on Information Technologies and Applied Sciences, March 5–6, 2021, Bratislava, Slovak Republic.
dc.relation.referencesen[9] uxuan Wang, Daisy Stanton, Yu Zhang, RJ-Skerry Ryan, EricBattenberg, Joel Shor, Ying Xiao, Ye Jia, Fei Ren, and Rif ASaurous, "Style tokens: Unsupervised style modeling, controland transfer in end-to-end speech synthesis," in InternationalConference on Machine Learning. PMLR, 2018, pp. 5180–5189.
dc.relation.referencesen[10] Nataliya Boyko, et. al., "Usage of Machine-based Translation Methods for Analyzing Open Data in Legal Cases". In: Proc. of the CybHyg-2019, Kyiv, Ukraine, November 30, 2019, pp. 328–338. CEUR-WS.org.
dc.relation.referencesen[11] Berezsky O., Verbovyy S., Pitsun O. Hybrid Intelligent information techology for biomedical image processing. Proceedings of the IEEE International Conference "Computer Science and Information Technologies" CSIT2018, Lviv. Ukraine, 11-14 September, 2018. R. 420-423. yihttps://doi.org/10.1109/STC-CSIT.2018.8526711
dc.relation.referencesen[12] Zoryana Rybchak, et. al. "Analysis of methods and means of text mining". ECONTECHMOD, 6(2), 2017, pp. 73-78.
dc.relation.referencesen[13] P. Zdebskyi, V. Lytvyn,Y. Burov, and et. Intelligent system for semantically similar sentences identification and generation based on machine learning methods, CEUR Workshop Proceedings, 2020, pp. 317–346.
dc.relation.referencesen[14] Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, and MingLiu, "Neural speech synthesis with transformer network," inProceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 6706–6713. https://doi.org/10.1609/aaai.v33i01.33016706
dc.relation.referencesen[15] Oleh Basystiuk, Nataliia Melnykova "Multimodal Approaches for Natural Language Processing in Medical Data" Proceedings of the 5th International Conference on Informatics & Data-Driven Medicine, Lyon, France, November 18 - 20, CEUR-WS.org, 2022. pp. 246-252
dc.relation.referencesen[16] N. Shakhovska, N. Boyko, P. Pukach. The Information Model of Cloud Data Warehouses International Conference on Computer Science and Information Technologies, CSIT 2018, September 11-14, Lviv, Ukraine, 2019, pp. 182-191. https://doi.org/10.1007/978-3-030-01069-0_13
dc.relation.referencesen[17] ifa Sun, Kun Li, Hao Wang, Shiyin Kang, and Helen Meng,"Phonetic posteriorgrams for many-to-one voice conversionwithout parallel data training," in 2016 IEEE InternationalConference on Multimedia and Expo (ICME). IEEE, 2016, pp.1–6. https://doi.org/10.1109/ICME.2016.7552917
dc.relation.referencesen[18] S. Chowdhury and J. Sil, "FACERECOGNITION from NON-FRONTALIMAGES Using DEEP NEURALNETWORK," in 2017 Ninth InternationalConference on Advances in PatternRecognition (ICAPR), 2017, pp. 1-6. https://doi.org/10.1109/ICAPR.2017.8593160
dc.relation.referencesen[19] Z. Rybchak, O. Basystiuk, Analysis of computer vision and image analysis technics, ECONTECHMOD: an international quarterly journal on economics of technology and modelling processes, Lublin, Poland, 2017, pp. 79-84.
dc.relation.referencesen[20] I. Zheliznyak, Z. Rybchak, I. Zavuschak, Analysis of clustering algorithms, 2017. Advances in Intelligent Systems and Computing, 2017, pp. 305–314. https://doi.org/10.1007/978-3-319-45991-2_21
dc.relation.urihttps://doi.org/10.1109/CVPR.2015.7298932
dc.relation.urihttps://doi.org/10.1109/ASRU51503.2021.9688051
dc.relation.urihttps://doi.org/10.21437/Interspeech.2021-2227
dc.relation.urihttps://doi.org/10.5815/ijisa.2018.09.05
dc.relation.urihttps://doi.org/10.1007/978-3-031-24475-9_57
dc.relation.urihttps://doi.org/10.1016/j.procs.2022.09.139
dc.relation.urihttps://doi.org/10.1109/STC-CSIT.2018.8526711
dc.relation.urihttps://doi.org/10.1609/aaai.v33i01.33016706
dc.relation.urihttps://doi.org/10.1007/978-3-030-01069-0_13
dc.relation.urihttps://doi.org/10.1109/ICME.2016.7552917
dc.relation.urihttps://doi.org/10.1109/ICAPR.2017.8593160
dc.relation.urihttps://doi.org/10.1007/978-3-319-45991-2_21
dc.rights.holder© Національний університет “Львівська політехніка”, 2024
dc.rights.holder© Basystiuk O., Melnykova N., 2024
dc.subjectперетворення мови в текст
dc.subjectрозпізнавання мови
dc.subjectsequence-to-sequence
dc.subjectмашинне навчання
dc.subjectштучний інтелект
dc.subjectSpeech-to-Text
dc.subjectspeech recognition
dc.subjectsequence-to-sequence
dc.subjectmachine learning
dc.subjectartificial intelligence
dc.titleDevelopment of the multimodal handling interface based on GOOGLE API
dc.title.alternativeРозробка інтерфейсу обробки мультимодальних даних на основі GOOGLE API
dc.typeArticle

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
2024v6n1_Basystiuk_O-Development_of_the_multimodal_216-223.pdf
Size:
303.73 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
2024v6n1_Basystiuk_O-Development_of_the_multimodal_216-223__COVER.png
Size:
452.28 KB
Format:
Portable Network Graphics

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.82 KB
Format:
Plain Text
Description: