Supervised Acoustic Embeddings And Their Transferability Across Languages

Sreepratha Ram, Hanan Aldarmaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In speech recognition, it is essential to model the phonetic content of the input signal while discarding irrelevant factors such as speaker variations and noise, which is challenging in low-resource settings. Self-supervised pretraining has been proposed as a way to improve both supervised and unsupervised speech recognition, including frame-level feature representations and Acoustic Word Embeddings (AWE) for variable-length segments. However, self-supervised models alone cannot learn perfect separation of the linguistic content as they are trained to optimize indirect objectives. In this work, we experiment with different pre-trained self-supervised features as input to AWE models and show that they work best within a supervised framework. Models trained on English can be transferred to other languages with no adaptation and outperform self-supervised models trained solely on the target languages.

Original languageEnglish
Title of host publicationICNLSP 2022 - Proceedings of the 5th International Conference on Natural Language and Speech Processing
EditorsMourad Abbas, Abed Alhakim Freihat
PublisherAssociation for Computational Linguistics (ACL)
Pages212-218
Number of pages7
ISBN (Electronic)9781959429364
Publication statusPublished - 2022
Externally publishedYes
Event5th International Conference on Natural Language and Speech Processing, ICNLSP 2022 - Virtual, Online
Duration: Dec 16 2022Dec 17 2022

Publication series

NameICNLSP 2022 - Proceedings of the 5th International Conference on Natural Language and Speech Processing

Conference

Conference5th International Conference on Natural Language and Speech Processing, ICNLSP 2022
CityVirtual, Online
Period12/16/2212/17/22

Keywords

  • Acoustic Word Embeddings
  • Transfer Learning
  • Unsupervised ASR

ASJC Scopus subject areas

  • Artificial Intelligence
  • Signal Processing
  • Linguistics and Language
  • Communication

Fingerprint

Dive into the research topics of 'Supervised Acoustic Embeddings And Their Transferability Across Languages'. Together they form a unique fingerprint.

Cite this