Audio Reading Assistant for Visually Impaired People

dc.citation.epage88
dc.citation.issue2
dc.citation.spage81
dc.contributor.affiliationLviv Polytechnic National University
dc.contributor.authorChypak, Yurii
dc.contributor.authorMorozov, Yuriy
dc.coverage.placenameЛьвів
dc.coverage.placenameLviv
dc.date.accessioned2024-02-19T09:44:31Z
dc.date.available2024-02-19T09:44:31Z
dc.date.created2023-02-28
dc.date.issued2023-02-28
dc.description.abstractThis paper describes an Android mobile phone application designed for blind or visually impaired people. The main aim of this system is to create an automatic text-reading assistant using the hardware capabilities of a mobile phone associated with innovative algorithms. The Android platform was chosen for people who already have a mobile phone and do not need to buy new hardware. Four key technologies are required: camera capture, text detection, speech synthesis, and voice detection. Moreover, a voice recognition subsystem has been created that meets the needs of blind users, allowing them to effectively control the application by voice. It requires three key technologies: voice capture over the embedded microphone, speech-to-text, and user request interpretation. As a result, the application for an Android platform was developed based on these technologies.
dc.format.extent81-88
dc.format.pages8
dc.identifier.citationChypak Y. Audio Reading Assistant for Visually Impaired People / Yurii Chypak, Yuriy Morozov // Advances in Cyber-Physical Systems. — Lviv : Lviv Politechnic Publishing House, 2023. — Vol 8. — No 2. — P. 81–88.
dc.identifier.citationenChypak Y. Audio Reading Assistant for Visually Impaired People / Yurii Chypak, Yuriy Morozov // Advances in Cyber-Physical Systems. — Lviv : Lviv Politechnic Publishing House, 2023. — Vol 8. — No 2. — P. 81–88.
dc.identifier.doidoi.org/10.23939/acps2023.02.081
dc.identifier.urihttps://ena.lpnu.ua/handle/ntb/61333
dc.language.isoen
dc.publisherВидавництво Львівської політехніки
dc.publisherLviv Politechnic Publishing House
dc.relation.ispartofAdvances in Cyber-Physical Systems, 2 (8), 2023
dc.relation.referencesRamoa G., Moured O., Schwarz T., Muller K., Stiefelha- gen R., (2023). Enabling People with Blindness to Distin- guish Lines of Mathematical Charts with Audio-Tactile Graphic Readers. PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Re- lated to Assistive Environments. Pp. 384—391. DOI: https://doi.org/10.1145/3594806.3594818
dc.relation.referencesYang P., Zhang J., Xu J., Li Y., (2022). An OCR System: Towards Mobile Device. ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technolo- gies. Pp. 1–7. DOI: https://doi.org/10.1145/3556677.3556685
dc.relation.referencesHildebrandt P., Schulze M., Cohen S., (2022). Optical character recognition guided image super-resolution. Do- cEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering. Article No. 14. Pp. 1—4. DOI: https://doi.org/10.1145/3558100.3563841
dc.relation.referencesThi-Tuyet-Hai N., Jatowt A., Coustaty A., Nhu-Van N., Doucet A., (2019). Deep statistical analysis of OCR errors for effective post-OCR processing. JCDL ’19: Proceed- ings of the 18th Joint Conference on Digital Libraries. Pp. 29–38. DOI: https://doi.org/10.1109/JCDL.2019.00015
dc.relation.referencesLiu R., Sisman B., Gao G., Li H., (2022). Decoding Knowledge Transfer for Neural Text-to-Speech Training. IEEE/ACM Transactions on Audio, Speech and Language Processing. vol. 30. Pp. 1—5. DOI: https://doi.org/10.1109/TASLP.2022.3171974
dc.relation.referencesAlexanderson S., Székely É., Henter G. E., Kucherenko T., Beskow J., (2020). Generating coherent spontaneous speech and gesture from text. IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. Pp. 1—3. DOI: https://doi.org/10.1145/3383652.3423874
dc.relation.referencesZhou Y., Tian X., Li H., (2021). Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation. IEEE/ACM Transactions on Audio, Speech and Language Processing. vol. 29. Pp. 3427— 3439. DOI: https://doi.org/10.1109/TASLP.2021.3125142
dc.relation.referencesLanglois Q., Jodogne S., (2023). Practical Study of Deep Learning Models for Speech Synthesis. PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments. Pp. 700—706.DOI: https://doi.org/10.1145/3594806.3596536
dc.relation.referencesYakubovskyi R., Morozov Y., (2023). Speech Models Training Technologies Comparison Using Word Error Rate. Advances in Cyber-Physical Systems. vol. 8, num. 1. Pp. 74–80. DOI: https://doi.org/10.23939/acps2023.01.074
dc.relation.referencesLiao J., Eskimez S., Lu L., Shi Y., Gong M., Shou L., Qu H., (2023). Improving Readability for Automatic Speech Recognition Transcription. ACM Transactions on Asian and Low-Resource Language Information Processing. vol. 22,num. 5. Pp. 1–23. DOI: https://doi.org/10.1145/3557894
dc.relation.referencesenRamoa G., Moured O., Schwarz T., Muller K., Stiefelha- gen R., (2023). Enabling People with Blindness to Distin- guish Lines of Mathematical Charts with Audio-Tactile Graphic Readers. PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Re- lated to Assistive Environments. Pp. 384-391. DOI: https://doi.org/10.1145/3594806.3594818
dc.relation.referencesenYang P., Zhang J., Xu J., Li Y., (2022). An OCR System: Towards Mobile Device. ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technolo- gies. Pp. 1–7. DOI: https://doi.org/10.1145/3556677.3556685
dc.relation.referencesenHildebrandt P., Schulze M., Cohen S., (2022). Optical character recognition guided image super-resolution. Do- cEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering. Article No. 14. Pp. 1-4. DOI: https://doi.org/10.1145/3558100.3563841
dc.relation.referencesenThi-Tuyet-Hai N., Jatowt A., Coustaty A., Nhu-Van N., Doucet A., (2019). Deep statistical analysis of OCR errors for effective post-OCR processing. JCDL ’19: Proceed- ings of the 18th Joint Conference on Digital Libraries. Pp. 29–38. DOI: https://doi.org/10.1109/JCDL.2019.00015
dc.relation.referencesenLiu R., Sisman B., Gao G., Li H., (2022). Decoding Knowledge Transfer for Neural Text-to-Speech Training. IEEE/ACM Transactions on Audio, Speech and Language Processing. vol. 30. Pp. 1-5. DOI: https://doi.org/10.1109/TASLP.2022.3171974
dc.relation.referencesenAlexanderson S., Székely É., Henter G. E., Kucherenko T., Beskow J., (2020). Generating coherent spontaneous speech and gesture from text. IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. Pp. 1-3. DOI: https://doi.org/10.1145/3383652.3423874
dc.relation.referencesenZhou Y., Tian X., Li H., (2021). Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation. IEEE/ACM Transactions on Audio, Speech and Language Processing. vol. 29. Pp. 3427- 3439. DOI: https://doi.org/10.1109/TASLP.2021.3125142
dc.relation.referencesenLanglois Q., Jodogne S., (2023). Practical Study of Deep Learning Models for Speech Synthesis. PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments. Pp. 700-706.DOI: https://doi.org/10.1145/3594806.3596536
dc.relation.referencesenYakubovskyi R., Morozov Y., (2023). Speech Models Training Technologies Comparison Using Word Error Rate. Advances in Cyber-Physical Systems. vol. 8, num. 1. Pp. 74–80. DOI: https://doi.org/10.23939/acps2023.01.074
dc.relation.referencesenLiao J., Eskimez S., Lu L., Shi Y., Gong M., Shou L., Qu H., (2023). Improving Readability for Automatic Speech Recognition Transcription. ACM Transactions on Asian and Low-Resource Language Information Processing. vol. 22,num. 5. Pp. 1–23. DOI: https://doi.org/10.1145/3557894
dc.relation.urihttps://doi.org/10.1145/3594806.3594818
dc.relation.urihttps://doi.org/10.1145/3556677.3556685
dc.relation.urihttps://doi.org/10.1145/3558100.3563841
dc.relation.urihttps://doi.org/10.1109/JCDL.2019.00015
dc.relation.urihttps://doi.org/10.1109/TASLP.2022.3171974
dc.relation.urihttps://doi.org/10.1145/3383652.3423874
dc.relation.urihttps://doi.org/10.1109/TASLP.2021.3125142
dc.relation.urihttps://doi.org/10.1145/3594806.3596536
dc.relation.urihttps://doi.org/10.23939/acps2023.01.074
dc.relation.urihttps://doi.org/10.1145/3557894
dc.rights.holder© Національний університет “Львівська політехніка”, 2023
dc.rights.holder© Chypak Y., Morozov Y., 2023
dc.subjectOCR
dc.subjectTTS
dc.subjectspeech synthesis
dc.subjectvoice detection
dc.titleAudio Reading Assistant for Visually Impaired People
dc.typeArticle

Files

Original bundle

Now showing 1 - 2 of 2
Thumbnail Image
Name:
2023v8n2_Chypak_Y-Audio_Reading_Assistant_for_81-88.pdf
Size:
608 KB
Format:
Adobe Portable Document Format
Thumbnail Image
Name:
2023v8n2_Chypak_Y-Audio_Reading_Assistant_for_81-88__COVER.png
Size:
490.41 KB
Format:
Portable Network Graphics

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.76 KB
Format:
Plain Text
Description: