Models and Methods for Speech Separation in Digital Systems
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Видавництво Львівської політехніки
Lviv Politechnic Publishing House
Lviv Politechnic Publishing House
Abstract
The main purpose of the article is to describe state-of-the-art approaches to speech separation and demonstrate the structures and challenges of building and training such systems. Designing efficient optimized neural network model for speech recognition requires using encoder-decoder model structure with masks estimation flow. The fully-convolutinoal SuDoRM-Rf model demonstrates the high efficiency with relatively small number of parameters and can be boosted with accelerators, that supports convolutional operations. The highest separation performance has been shown by the SepTDA model with 24 dB in SI-SNR with 21.2 million of trainable parameters, while SuDoRM-Rf with only 2.66 million has demonsrated 12.02 dB. Another transformer-based neural network approaches has demonstrated almost the same performance as SepTDA model but requires more trainable parameters.
Description
Citation
Tsemko A. Models and Methods for Speech Separation in Digital Systems / Andrii Tsemko, Ivan Karbovnyk // Advances in Cyber-Physical Systems. — Lviv : Lviv Politechnic Publishing House, 2024. — Vol 9. — No 2. — P. 121–127.