Embedding speech recognition tools for custom software: Engines Overview

dc.citation.epage121
dc.citation.spage114
dc.contributor.affiliationDepartment of Applied Mathematics, Lviv Polytechnic National University
dc.contributor.authorDovbysh, Arthur
dc.contributor.authorAlieksieiev, Vladyslav
dc.coverage.placenameLviv
dc.coverage.temporal25-27 June 2018
dc.date.accessioned2018-09-03T11:41:08Z
dc.date.available2018-09-03T11:41:08Z
dc.date.created2018-06-25
dc.date.issued2018-06-25
dc.description.abstractDifferent solutions and tools for speech recognition are now available. Nevertheless, implementation of natural language processing still remains a current problem. Developing any custom software with a good style of UI/UX requires the integration of speech recognition. Evidently, the most common solution is to use some engine as an embedded standard tool. Here in the paper we are presenting an overview and an analysis of some popular speech recognition engines: Google Speech Recognition API, Microsoft Speech API, Yandex Speech Kit and Julius. These speech recognition tools are a readyto- serve and suitable to supplement your own software with a reliable voice command detection or voice control feature. The results of our analysis comes from an experiment of voice recognition using these tools as an embedded component in a custom software.
dc.format.extent114-121
dc.format.pages8
dc.identifier.citationDovbysh A. Embedding speech recognition tools for custom software: Engines Overview / Arthur Dovbysh, Vladyslav Alieksieiev // Computational linguistics and intelligent systems, 25-27 June 2018. — Lviv : Lviv Polytechnic National University, 2018. — Vol 2 : Workshop. — P. 114–121. — (Part 2. Workshop conference tracks. Section I. Computational Linguistics).
dc.identifier.citationenDovbysh A. Embedding speech recognition tools for custom software: Engines Overview / Arthur Dovbysh, Vladyslav Alieksieiev // Computational linguistics and intelligent systems, 25-27 June 2018. — Lviv : Lviv Polytechnic National University, 2018. — Vol 2 : Workshop. — P. 114–121. — (Part 2. Workshop conference tracks. Section I. Computational Linguistics).
dc.identifier.issn2523-4013
dc.identifier.urihttps://ena.lpnu.ua/handle/ntb/42557
dc.language.isoen
dc.publisherLviv Polytechnic National University
dc.relation.ispartofComputational linguistics and intelligent systems (2), 2018
dc.relation.references1. Lawrence R. Rabiner and Bernard Gold. Theory and Application of Digital Signal Processing — Prentice Hall, 1975 – 762 p.
dc.relation.references2. Plotikov V., Sukhanov V., Jhygulevtsev Yu. Rechevoy dialog v sistemah upravleniya. – Moscow: Mashinostoenie, 1988 – 223 p. – In Russian [Pечевoй диaлoг в cиcтемaх упpaвления / В.Н.Плoтникoв, В.A.Cухaнoв, Ю.Н.Жигулевцев. – М.: Мaшинocтpoение, 1988. – 223 c. – ISBN 5-217-00148-8]
dc.relation.references3. Yandex SpeechKit // Yandex – https://tech.yandex.ru/speechkit/ (Retrieved on May 2018)
dc.relation.references4. Yandex.SpeechKit // Wikipedia.org, 18.05.2018 – https://ru.wikipedia.org/wiki/Yandex.SpeechKit (Retrieved on May 2018)
dc.relation.references5. Minimum Prediction Residual Principle Applied to Speech Recognition / Itakura F. // IEEE Transactions on Acoustics, Speech, and Signal processing. – February 1975. – Vol. 23, No. 1. – P.67–72.
dc.relation.references6. Cloud Speech-to-Text – Speech Recognition // Google Cloud – https://cloud.google.com/speech-to-text/ (Retrieved in May 2018)
dc.relation.references7. Google launches an improved speech-to-text service for developers / F. Lardinois // Techcrunch.com – April 9, 2018. – https://techcrunch.com/2018/04/09/google-launchesan- improved-speech-to-text-service-for-developers/
dc.relation.references8. Microsoft Speech API // Wikipedia.org, 08.11.2017 – https://en.wikipedia.org/wiki/Microsoft_Speech_API (Retrieved on May 2018)
dc.relation.references9. Microsoft Speech Platform SDK 11 Requirements and Installation // Microsoft Developer Network – https://msdn.microsoft.com/ru-ru/library/hh362873(v=office.14).aspx (Retrieved on May, 2018)
dc.relation.references10. Julius: Open-Source Large Vocabulary Continuous Speech Recognition Engine / Akinobu Lee // GitHub, April 2018 — https://github.com/julius-speech/julius (Retrieved on May 2018)
dc.relation.references11. T. Kawahara, A. Lee, T. Kobayashi, K. Takeda, N. Minematsu, S. Sagayama, K. Itou, A. Ito, M. Yamamoto, A. Yamada, T. Utsuro and K. Shikano. "Free software toolkit for Japanese large vocabulary continuous speech recognition." In Proc. Int'l Conf. on Spoken Language Processing (ICSLP), Vol. 4, pp. 476-479, 2000.
dc.relation.references12. A. Lee, T. Kawahara and K. Shikano. "Julius – an open source real-time large vocabulary recognition engine." In Proc. European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1691-1694, 2001.
dc.relation.references13. A. Lee and T. Kawahara. "Recent Development of Open-Source Speech Recognition Engine Julius" Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2009.
dc.relation.referencesen1. Lawrence R. Rabiner and Bernard Gold. Theory and Application of Digital Signal Processing - Prentice Hall, 1975 – 762 p.
dc.relation.referencesen2. Plotikov V., Sukhanov V., Jhygulevtsev Yu. Rechevoy dialog v sistemah upravleniya, Moscow: Mashinostoenie, 1988 – 223 p, In Russian [Pechevoi dialoh v cictemakh uppavleniia, V.N.Plotnikov, V.A.Cukhanov, Iu.N.Zhihulevtsev, M., Mashinoctpoenie, 1988, 223 c, ISBN 5-217-00148-8]
dc.relation.referencesen3. Yandex SpeechKit, Yandex – https://tech.yandex.ru/speechkit/ (Retrieved on May 2018)
dc.relation.referencesen4. Yandex.SpeechKit, Wikipedia.org, 18.05.2018 – https://ru.wikipedia.org/wiki/Yandex.SpeechKit (Retrieved on May 2018)
dc.relation.referencesen5. Minimum Prediction Residual Principle Applied to Speech Recognition, Itakura F., IEEE Transactions on Acoustics, Speech, and Signal processing, February 1975, Vol. 23, No. 1, P.67–72.
dc.relation.referencesen6. Cloud Speech-to-Text – Speech Recognition, Google Cloud – https://cloud.google.com/speech-to-text/ (Retrieved in May 2018)
dc.relation.referencesen7. Google launches an improved speech-to-text service for developers, F. Lardinois, Techcrunch.com – April 9, 2018, https://techcrunch.com/2018/04/09/google-launchesan- improved-speech-to-text-service-for-developers/
dc.relation.referencesen8. Microsoft Speech API, Wikipedia.org, 08.11.2017 – https://en.wikipedia.org/wiki/Microsoft_Speech_API (Retrieved on May 2018)
dc.relation.referencesen9. Microsoft Speech Platform SDK 11 Requirements and Installation, Microsoft Developer Network – https://msdn.microsoft.com/ru-ru/library/hh362873(v=office.14).aspx (Retrieved on May, 2018)
dc.relation.referencesen10. Julius: Open-Source Large Vocabulary Continuous Speech Recognition Engine, Akinobu Lee, GitHub, April 2018 - https://github.com/julius-speech/julius (Retrieved on May 2018)
dc.relation.referencesen11. T. Kawahara, A. Lee, T. Kobayashi, K. Takeda, N. Minematsu, S. Sagayama, K. Itou, A. Ito, M. Yamamoto, A. Yamada, T. Utsuro and K. Shikano. "Free software toolkit for Japanese large vocabulary continuous speech recognition." In Proc. Int'l Conf. on Spoken Language Processing (ICSLP), Vol. 4, pp. 476-479, 2000.
dc.relation.referencesen12. A. Lee, T. Kawahara and K. Shikano. "Julius – an open source real-time large vocabulary recognition engine." In Proc. European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1691-1694, 2001.
dc.relation.referencesen13. A. Lee and T. Kawahara. "Recent Development of Open-Source Speech Recognition Engine Julius" Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2009.
dc.relation.urihttps://tech.yandex.ru/speechkit/
dc.relation.urihttps://ru.wikipedia.org/wiki/Yandex.SpeechKit
dc.relation.urihttps://cloud.google.com/speech-to-text/
dc.relation.urihttps://techcrunch.com/2018/04/09/google-launchesan-
dc.relation.urihttps://en.wikipedia.org/wiki/Microsoft_Speech_API
dc.relation.urihttps://msdn.microsoft.com/ru-ru/library/hh362873(v=office.14).aspx
dc.relation.urihttps://github.com/julius-speech/julius
dc.rights.holder© 2018 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.
dc.subjectspeech recognition
dc.subjectspeech engine
dc.subjectAPI
dc.subjectvoice command detection
dc.subjectvoice control
dc.subjectGoogle
dc.subjectMicrosoft
dc.subjectYandex
dc.subjectJulius
dc.subjectoverview and analysis
dc.titleEmbedding speech recognition tools for custom software: Engines Overview
dc.typeConference Abstract

Files

Original bundle

Now showing 1 - 2 of 2
Thumbnail Image
Name:
COLINS_2018_2018v2_Dovbysh_A-Embedding_speech_recognition_114-121.pdf
Size:
2.24 MB
Format:
Adobe Portable Document Format
Thumbnail Image
Name:
COLINS_2018_2018v2_Dovbysh_A-Embedding_speech_recognition_114-121__COVER.png
Size:
281.87 KB
Format:
Portable Network Graphics

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.95 KB
Format:
Plain Text
Description: