Audio Reading Assistant for Visually Impaired People
dc.citation.epage | 88 | |
dc.citation.issue | 2 | |
dc.citation.spage | 81 | |
dc.contributor.affiliation | Lviv Polytechnic National University | |
dc.contributor.author | Chypak, Yurii | |
dc.contributor.author | Morozov, Yuriy | |
dc.coverage.placename | Львів | |
dc.coverage.placename | Lviv | |
dc.date.accessioned | 2024-02-19T09:44:31Z | |
dc.date.available | 2024-02-19T09:44:31Z | |
dc.date.created | 2023-02-28 | |
dc.date.issued | 2023-02-28 | |
dc.description.abstract | This paper describes an Android mobile phone application designed for blind or visually impaired people. The main aim of this system is to create an automatic text-reading assistant using the hardware capabilities of a mobile phone associated with innovative algorithms. The Android platform was chosen for people who already have a mobile phone and do not need to buy new hardware. Four key technologies are required: camera capture, text detection, speech synthesis, and voice detection. Moreover, a voice recognition subsystem has been created that meets the needs of blind users, allowing them to effectively control the application by voice. It requires three key technologies: voice capture over the embedded microphone, speech-to-text, and user request interpretation. As a result, the application for an Android platform was developed based on these technologies. | |
dc.format.extent | 81-88 | |
dc.format.pages | 8 | |
dc.identifier.citation | Chypak Y. Audio Reading Assistant for Visually Impaired People / Yurii Chypak, Yuriy Morozov // Advances in Cyber-Physical Systems. — Lviv : Lviv Politechnic Publishing House, 2023. — Vol 8. — No 2. — P. 81–88. | |
dc.identifier.citationen | Chypak Y. Audio Reading Assistant for Visually Impaired People / Yurii Chypak, Yuriy Morozov // Advances in Cyber-Physical Systems. — Lviv : Lviv Politechnic Publishing House, 2023. — Vol 8. — No 2. — P. 81–88. | |
dc.identifier.doi | doi.org/10.23939/acps2023.02.081 | |
dc.identifier.uri | https://ena.lpnu.ua/handle/ntb/61333 | |
dc.language.iso | en | |
dc.publisher | Видавництво Львівської політехніки | |
dc.publisher | Lviv Politechnic Publishing House | |
dc.relation.ispartof | Advances in Cyber-Physical Systems, 2 (8), 2023 | |
dc.relation.references | Ramoa G., Moured O., Schwarz T., Muller K., Stiefelha- gen R., (2023). Enabling People with Blindness to Distin- guish Lines of Mathematical Charts with Audio-Tactile Graphic Readers. PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Re- lated to Assistive Environments. Pp. 384—391. DOI: https://doi.org/10.1145/3594806.3594818 | |
dc.relation.references | Yang P., Zhang J., Xu J., Li Y., (2022). An OCR System: Towards Mobile Device. ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technolo- gies. Pp. 1–7. DOI: https://doi.org/10.1145/3556677.3556685 | |
dc.relation.references | Hildebrandt P., Schulze M., Cohen S., (2022). Optical character recognition guided image super-resolution. Do- cEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering. Article No. 14. Pp. 1—4. DOI: https://doi.org/10.1145/3558100.3563841 | |
dc.relation.references | Thi-Tuyet-Hai N., Jatowt A., Coustaty A., Nhu-Van N., Doucet A., (2019). Deep statistical analysis of OCR errors for effective post-OCR processing. JCDL ’19: Proceed- ings of the 18th Joint Conference on Digital Libraries. Pp. 29–38. DOI: https://doi.org/10.1109/JCDL.2019.00015 | |
dc.relation.references | Liu R., Sisman B., Gao G., Li H., (2022). Decoding Knowledge Transfer for Neural Text-to-Speech Training. IEEE/ACM Transactions on Audio, Speech and Language Processing. vol. 30. Pp. 1—5. DOI: https://doi.org/10.1109/TASLP.2022.3171974 | |
dc.relation.references | Alexanderson S., Székely É., Henter G. E., Kucherenko T., Beskow J., (2020). Generating coherent spontaneous speech and gesture from text. IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. Pp. 1—3. DOI: https://doi.org/10.1145/3383652.3423874 | |
dc.relation.references | Zhou Y., Tian X., Li H., (2021). Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation. IEEE/ACM Transactions on Audio, Speech and Language Processing. vol. 29. Pp. 3427— 3439. DOI: https://doi.org/10.1109/TASLP.2021.3125142 | |
dc.relation.references | Langlois Q., Jodogne S., (2023). Practical Study of Deep Learning Models for Speech Synthesis. PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments. Pp. 700—706.DOI: https://doi.org/10.1145/3594806.3596536 | |
dc.relation.references | Yakubovskyi R., Morozov Y., (2023). Speech Models Training Technologies Comparison Using Word Error Rate. Advances in Cyber-Physical Systems. vol. 8, num. 1. Pp. 74–80. DOI: https://doi.org/10.23939/acps2023.01.074 | |
dc.relation.references | Liao J., Eskimez S., Lu L., Shi Y., Gong M., Shou L., Qu H., (2023). Improving Readability for Automatic Speech Recognition Transcription. ACM Transactions on Asian and Low-Resource Language Information Processing. vol. 22,num. 5. Pp. 1–23. DOI: https://doi.org/10.1145/3557894 | |
dc.relation.referencesen | Ramoa G., Moured O., Schwarz T., Muller K., Stiefelha- gen R., (2023). Enabling People with Blindness to Distin- guish Lines of Mathematical Charts with Audio-Tactile Graphic Readers. PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Re- lated to Assistive Environments. Pp. 384-391. DOI: https://doi.org/10.1145/3594806.3594818 | |
dc.relation.referencesen | Yang P., Zhang J., Xu J., Li Y., (2022). An OCR System: Towards Mobile Device. ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technolo- gies. Pp. 1–7. DOI: https://doi.org/10.1145/3556677.3556685 | |
dc.relation.referencesen | Hildebrandt P., Schulze M., Cohen S., (2022). Optical character recognition guided image super-resolution. Do- cEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering. Article No. 14. Pp. 1-4. DOI: https://doi.org/10.1145/3558100.3563841 | |
dc.relation.referencesen | Thi-Tuyet-Hai N., Jatowt A., Coustaty A., Nhu-Van N., Doucet A., (2019). Deep statistical analysis of OCR errors for effective post-OCR processing. JCDL ’19: Proceed- ings of the 18th Joint Conference on Digital Libraries. Pp. 29–38. DOI: https://doi.org/10.1109/JCDL.2019.00015 | |
dc.relation.referencesen | Liu R., Sisman B., Gao G., Li H., (2022). Decoding Knowledge Transfer for Neural Text-to-Speech Training. IEEE/ACM Transactions on Audio, Speech and Language Processing. vol. 30. Pp. 1-5. DOI: https://doi.org/10.1109/TASLP.2022.3171974 | |
dc.relation.referencesen | Alexanderson S., Székely É., Henter G. E., Kucherenko T., Beskow J., (2020). Generating coherent spontaneous speech and gesture from text. IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. Pp. 1-3. DOI: https://doi.org/10.1145/3383652.3423874 | |
dc.relation.referencesen | Zhou Y., Tian X., Li H., (2021). Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation. IEEE/ACM Transactions on Audio, Speech and Language Processing. vol. 29. Pp. 3427- 3439. DOI: https://doi.org/10.1109/TASLP.2021.3125142 | |
dc.relation.referencesen | Langlois Q., Jodogne S., (2023). Practical Study of Deep Learning Models for Speech Synthesis. PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments. Pp. 700-706.DOI: https://doi.org/10.1145/3594806.3596536 | |
dc.relation.referencesen | Yakubovskyi R., Morozov Y., (2023). Speech Models Training Technologies Comparison Using Word Error Rate. Advances in Cyber-Physical Systems. vol. 8, num. 1. Pp. 74–80. DOI: https://doi.org/10.23939/acps2023.01.074 | |
dc.relation.referencesen | Liao J., Eskimez S., Lu L., Shi Y., Gong M., Shou L., Qu H., (2023). Improving Readability for Automatic Speech Recognition Transcription. ACM Transactions on Asian and Low-Resource Language Information Processing. vol. 22,num. 5. Pp. 1–23. DOI: https://doi.org/10.1145/3557894 | |
dc.relation.uri | https://doi.org/10.1145/3594806.3594818 | |
dc.relation.uri | https://doi.org/10.1145/3556677.3556685 | |
dc.relation.uri | https://doi.org/10.1145/3558100.3563841 | |
dc.relation.uri | https://doi.org/10.1109/JCDL.2019.00015 | |
dc.relation.uri | https://doi.org/10.1109/TASLP.2022.3171974 | |
dc.relation.uri | https://doi.org/10.1145/3383652.3423874 | |
dc.relation.uri | https://doi.org/10.1109/TASLP.2021.3125142 | |
dc.relation.uri | https://doi.org/10.1145/3594806.3596536 | |
dc.relation.uri | https://doi.org/10.23939/acps2023.01.074 | |
dc.relation.uri | https://doi.org/10.1145/3557894 | |
dc.rights.holder | © Національний університет “Львівська політехніка”, 2023 | |
dc.rights.holder | © Chypak Y., Morozov Y., 2023 | |
dc.subject | OCR | |
dc.subject | TTS | |
dc.subject | speech synthesis | |
dc.subject | voice detection | |
dc.title | Audio Reading Assistant for Visually Impaired People | |
dc.type | Article |
Files
License bundle
1 - 1 of 1