Unsupervised Automatic Speech Recognition: A review

Hanan Aldarmaki, Asad Ullah, Sreepratha Ram, Nazar Zaki

Research output: Contribution to journalReview articlepeer-review

30 Citations (Scopus)


Automatic Speech Recognition (ASR) systems can be trained to achieve remarkable performance given large amounts of manually transcribed speech, but large labeled data sets can be difficult or expensive to acquire for all languages of interest. In this paper, we review the research literature to identify models and ideas that could lead to fully unsupervised ASR, including unsupervised sub-word and word modeling, unsupervised segmentation of the speech signal, and unsupervised mapping from speech segments to text. The objective of the study is to identify the limitations of what can be learned from speech data alone and to understand the minimum requirements for speech recognition. Identifying these limitations would help optimize the resources and efforts in ASR development for low-resource languages.

Original languageEnglish
Pages (from-to)76-91
Number of pages16
JournalSpeech Communication
Publication statusPublished - Apr 2022


  • Cross-modal mapping
  • Speech segmentation
  • Survey
  • Unsupervised ASR

ASJC Scopus subject areas

  • Software
  • Modelling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications


Dive into the research topics of 'Unsupervised Automatic Speech Recognition: A review'. Together they form a unique fingerprint.

Cite this