Models and Methods for Speech Separation in Digital Systems

Journal Title

Journal ISSN

Volume Title

Publisher

Видавництво Львівської політехніки
Lviv Politechnic Publishing House

Abstract

The main purpose of the article is to describe state-of-the-art approaches to speech separation and demonstrate the structures and challenges of building and training such systems. Designing efficient optimized neural network model for speech recognition requires using encoder-decoder model structure with masks estimation flow. The fully-convolutinoal SuDoRM-Rf model demonstrates the high efficiency with relatively small number of parameters and can be boosted with accelerators, that supports convolutional operations. The highest separation performance has been shown by the SepTDA model with 24 dB in SI-SNR with 21.2 million of trainable parameters, while SuDoRM-Rf with only 2.66 million has demonsrated 12.02 dB. Another transformer-based neural network approaches has demonstrated almost the same performance as SepTDA model but requires more trainable parameters.

Description

Citation

Tsemko A. Models and Methods for Speech Separation in Digital Systems / Andrii Tsemko, Ivan Karbovnyk // Advances in Cyber-Physical Systems. — Lviv : Lviv Politechnic Publishing House, 2024. — Vol 9. — No 2. — P. 121–127.

Endorsement

Review

Supplemented By

Referenced By