Audio Reading Assistant for Visually Impaired People

Chypak, Yurii; Morozov, Yuriy

Audio Reading Assistant for Visually Impaired People

dc.citation.epage	88
dc.citation.issue	2
dc.citation.spage	81
dc.contributor.affiliation	Lviv Polytechnic National University
dc.contributor.author	Chypak, Yurii
dc.contributor.author	Morozov, Yuriy
dc.coverage.placename	Львів
dc.coverage.placename	Lviv
dc.date.accessioned	2024-02-19T09:44:31Z
dc.date.available	2024-02-19T09:44:31Z
dc.date.created	2023-02-28
dc.date.issued	2023-02-28
dc.description.abstract	This paper describes an Android mobile phone application designed for blind or visually impaired people. The main aim of this system is to create an automatic text-reading assistant using the hardware capabilities of a mobile phone associated with innovative algorithms. The Android platform was chosen for people who already have a mobile phone and do not need to buy new hardware. Four key technologies are required: camera capture, text detection, speech synthesis, and voice detection. Moreover, a voice recognition subsystem has been created that meets the needs of blind users, allowing them to effectively control the application by voice. It requires three key technologies: voice capture over the embedded microphone, speech-to-text, and user request interpretation. As a result, the application for an Android platform was developed based on these technologies.
dc.format.extent	81-88
dc.format.pages	8
dc.identifier.citation	Chypak Y. Audio Reading Assistant for Visually Impaired People / Yurii Chypak, Yuriy Morozov // Advances in Cyber-Physical Systems. — Lviv : Lviv Politechnic Publishing House, 2023. — Vol 8. — No 2. — P. 81–88.
dc.identifier.citationen	Chypak Y. Audio Reading Assistant for Visually Impaired People / Yurii Chypak, Yuriy Morozov // Advances in Cyber-Physical Systems. — Lviv : Lviv Politechnic Publishing House, 2023. — Vol 8. — No 2. — P. 81–88.
dc.identifier.doi	doi.org/10.23939/acps2023.02.081
dc.identifier.uri	https://ena.lpnu.ua/handle/ntb/61333
dc.language.iso	en
dc.publisher	Видавництво Львівської політехніки
dc.publisher	Lviv Politechnic Publishing House
dc.relation.ispartof	Advances in Cyber-Physical Systems, 2 (8), 2023
dc.relation.references	Ramoa G., Moured O., Schwarz T., Muller K., Stiefelha- gen R., (2023). Enabling People with Blindness to Distin- guish Lines of Mathematical Charts with Audio-Tactile Graphic Readers. PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Re- lated to Assistive Environments. Pp. 384—391. DOI: https://doi.org/10.1145/3594806.3594818
dc.relation.references	Yang P., Zhang J., Xu J., Li Y., (2022). An OCR System: Towards Mobile Device. ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technolo- gies. Pp. 1–7. DOI: https://doi.org/10.1145/3556677.3556685
dc.relation.references	Hildebrandt P., Schulze M., Cohen S., (2022). Optical character recognition guided image super-resolution. Do- cEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering. Article No. 14. Pp. 1—4. DOI: https://doi.org/10.1145/3558100.3563841
dc.relation.references	Thi-Tuyet-Hai N., Jatowt A., Coustaty A., Nhu-Van N., Doucet A., (2019). Deep statistical analysis of OCR errors for effective post-OCR processing. JCDL ’19: Proceed- ings of the 18th Joint Conference on Digital Libraries. Pp. 29–38. DOI: https://doi.org/10.1109/JCDL.2019.00015
dc.relation.references	Liu R., Sisman B., Gao G., Li H., (2022). Decoding Knowledge Transfer for Neural Text-to-Speech Training. IEEE/ACM Transactions on Audio, Speech and Language Processing. vol. 30. Pp. 1—5. DOI: https://doi.org/10.1109/TASLP.2022.3171974
dc.relation.references	Alexanderson S., Székely É., Henter G. E., Kucherenko T., Beskow J., (2020). Generating coherent spontaneous speech and gesture from text. IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. Pp. 1—3. DOI: https://doi.org/10.1145/3383652.3423874
dc.relation.references	Zhou Y., Tian X., Li H., (2021). Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation. IEEE/ACM Transactions on Audio, Speech and Language Processing. vol. 29. Pp. 3427— 3439. DOI: https://doi.org/10.1109/TASLP.2021.3125142
dc.relation.references	Langlois Q., Jodogne S., (2023). Practical Study of Deep Learning Models for Speech Synthesis. PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments. Pp. 700—706.DOI: https://doi.org/10.1145/3594806.3596536
dc.relation.references	Yakubovskyi R., Morozov Y., (2023). Speech Models Training Technologies Comparison Using Word Error Rate. Advances in Cyber-Physical Systems. vol. 8, num. 1. Pp. 74–80. DOI: https://doi.org/10.23939/acps2023.01.074
dc.relation.references	Liao J., Eskimez S., Lu L., Shi Y., Gong M., Shou L., Qu H., (2023). Improving Readability for Automatic Speech Recognition Transcription. ACM Transactions on Asian and Low-Resource Language Information Processing. vol. 22,num. 5. Pp. 1–23. DOI: https://doi.org/10.1145/3557894
dc.relation.referencesen	Ramoa G., Moured O., Schwarz T., Muller K., Stiefelha- gen R., (2023). Enabling People with Blindness to Distin- guish Lines of Mathematical Charts with Audio-Tactile Graphic Readers. PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Re- lated to Assistive Environments. Pp. 384-391. DOI: https://doi.org/10.1145/3594806.3594818
dc.relation.referencesen	Yang P., Zhang J., Xu J., Li Y., (2022). An OCR System: Towards Mobile Device. ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technolo- gies. Pp. 1–7. DOI: https://doi.org/10.1145/3556677.3556685
dc.relation.referencesen	Hildebrandt P., Schulze M., Cohen S., (2022). Optical character recognition guided image super-resolution. Do- cEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering. Article No. 14. Pp. 1-4. DOI: https://doi.org/10.1145/3558100.3563841
dc.relation.referencesen	Thi-Tuyet-Hai N., Jatowt A., Coustaty A., Nhu-Van N., Doucet A., (2019). Deep statistical analysis of OCR errors for effective post-OCR processing. JCDL ’19: Proceed- ings of the 18th Joint Conference on Digital Libraries. Pp. 29–38. DOI: https://doi.org/10.1109/JCDL.2019.00015
dc.relation.referencesen	Liu R., Sisman B., Gao G., Li H., (2022). Decoding Knowledge Transfer for Neural Text-to-Speech Training. IEEE/ACM Transactions on Audio, Speech and Language Processing. vol. 30. Pp. 1-5. DOI: https://doi.org/10.1109/TASLP.2022.3171974
dc.relation.referencesen	Alexanderson S., Székely É., Henter G. E., Kucherenko T., Beskow J., (2020). Generating coherent spontaneous speech and gesture from text. IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. Pp. 1-3. DOI: https://doi.org/10.1145/3383652.3423874
dc.relation.referencesen	Zhou Y., Tian X., Li H., (2021). Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation. IEEE/ACM Transactions on Audio, Speech and Language Processing. vol. 29. Pp. 3427- 3439. DOI: https://doi.org/10.1109/TASLP.2021.3125142
dc.relation.referencesen	Langlois Q., Jodogne S., (2023). Practical Study of Deep Learning Models for Speech Synthesis. PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments. Pp. 700-706.DOI: https://doi.org/10.1145/3594806.3596536
dc.relation.referencesen	Yakubovskyi R., Morozov Y., (2023). Speech Models Training Technologies Comparison Using Word Error Rate. Advances in Cyber-Physical Systems. vol. 8, num. 1. Pp. 74–80. DOI: https://doi.org/10.23939/acps2023.01.074
dc.relation.referencesen	Liao J., Eskimez S., Lu L., Shi Y., Gong M., Shou L., Qu H., (2023). Improving Readability for Automatic Speech Recognition Transcription. ACM Transactions on Asian and Low-Resource Language Information Processing. vol. 22,num. 5. Pp. 1–23. DOI: https://doi.org/10.1145/3557894
dc.relation.uri	https://doi.org/10.1145/3594806.3594818
dc.relation.uri	https://doi.org/10.1145/3556677.3556685
dc.relation.uri	https://doi.org/10.1145/3558100.3563841
dc.relation.uri	https://doi.org/10.1109/JCDL.2019.00015
dc.relation.uri	https://doi.org/10.1109/TASLP.2022.3171974
dc.relation.uri	https://doi.org/10.1145/3383652.3423874
dc.relation.uri	https://doi.org/10.1109/TASLP.2021.3125142
dc.relation.uri	https://doi.org/10.1145/3594806.3596536
dc.relation.uri	https://doi.org/10.23939/acps2023.01.074
dc.relation.uri	https://doi.org/10.1145/3557894
dc.rights.holder	© Національний університет “Львівська політехніка”, 2023
dc.rights.holder	© Chypak Y., Morozov Y., 2023
dc.subject	OCR
dc.subject	TTS
dc.subject	speech synthesis
dc.subject	voice detection
dc.title	Audio Reading Assistant for Visually Impaired People
dc.type	Article

Files

Original bundle

Now showing 1 - 2 of 2

Name:: 2023v8n2_Chypak_Y-Audio_Reading_Assistant_for_81-88.pdf
Size:: 608 KB
Format:: Adobe Portable Document Format

Download

Name:: 2023v8n2_Chypak_Y-Audio_Reading_Assistant_for_81-88__COVER.png
Size:: 490.41 KB
Format:: Portable Network Graphics

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.76 KB
Format:: Plain Text
Description:

Download

Collections

Advances In Cyber-Physical Systems. – 2023. – Vol. 8, No. 2