TY - GEN
T1 - Supervised Acoustic Embeddings And Their Transferability Across Languages
AU - Ram, Sreepratha
AU - Aldarmaki, Hanan
N1 - Funding Information:
This work was supported by grant no. 31T139 at United Arab Emirates University and partially funded under UAEU-ZU Joint Research Grant G00003715 (Fund No.: 12T034) through Emirates Center for Mobility Research.
Publisher Copyright:
© ICNLSP 2022.All rights reserved
PY - 2022
Y1 - 2022
N2 - In speech recognition, it is essential to model the phonetic content of the input signal while discarding irrelevant factors such as speaker variations and noise, which is challenging in low-resource settings. Self-supervised pretraining has been proposed as a way to improve both supervised and unsupervised speech recognition, including frame-level feature representations and Acoustic Word Embeddings (AWE) for variable-length segments. However, self-supervised models alone cannot learn perfect separation of the linguistic content as they are trained to optimize indirect objectives. In this work, we experiment with different pre-trained self-supervised features as input to AWE models and show that they work best within a supervised framework. Models trained on English can be transferred to other languages with no adaptation and outperform self-supervised models trained solely on the target languages.
AB - In speech recognition, it is essential to model the phonetic content of the input signal while discarding irrelevant factors such as speaker variations and noise, which is challenging in low-resource settings. Self-supervised pretraining has been proposed as a way to improve both supervised and unsupervised speech recognition, including frame-level feature representations and Acoustic Word Embeddings (AWE) for variable-length segments. However, self-supervised models alone cannot learn perfect separation of the linguistic content as they are trained to optimize indirect objectives. In this work, we experiment with different pre-trained self-supervised features as input to AWE models and show that they work best within a supervised framework. Models trained on English can be transferred to other languages with no adaptation and outperform self-supervised models trained solely on the target languages.
KW - Acoustic Word Embeddings
KW - Transfer Learning
KW - Unsupervised ASR
UR - http://www.scopus.com/inward/record.url?scp=85152129326&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85152129326&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85152129326
T3 - ICNLSP 2022 - Proceedings of the 5th International Conference on Natural Language and Speech Processing
SP - 212
EP - 218
BT - ICNLSP 2022 - Proceedings of the 5th International Conference on Natural Language and Speech Processing
A2 - Abbas, Mourad
A2 - Freihat, Abed Alhakim
PB - Association for Computational Linguistics (ACL)
T2 - 5th International Conference on Natural Language and Speech Processing, ICNLSP 2022
Y2 - 16 December 2022 through 17 December 2022
ER -