TY - GEN
T1 - ArTST
T2 - 1st Arabic Natural Language Processing Conference, ArabicNLP 2023
AU - Toyin, Hawau Olamide
AU - Djanibekov, Amirbek
AU - Kulkarni, Ajinkya
AU - Aldarmaki, Hanan
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language. The model architecture follows the unified-modal framework, SpeechT5, that was recently released for English, and is focused on Modern Standard Arabic (MSA), with plans to extend the model for dialectal and code-switched Arabic in future editions. We pre-trained the model from scratch on MSA speech and text data, and fine-tuned it for the following tasks: Automatic Speech Recognition (ASR), Text-To-Speech synthesis (TTS), and spoken dialect identification. In our experiments comparing ArTST with SpeechT5, as well as with previously reported results in these tasks, ArTST performs on a par with or exceeding the current state-of-the-art in all three tasks. Moreover, we find that our pre-training is conducive for generalization, which is particularly evident in the low-resource TTS task. The pre-trained model as well as the fine-tuned ASR and TTS models are released for research use.
AB - We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language. The model architecture follows the unified-modal framework, SpeechT5, that was recently released for English, and is focused on Modern Standard Arabic (MSA), with plans to extend the model for dialectal and code-switched Arabic in future editions. We pre-trained the model from scratch on MSA speech and text data, and fine-tuned it for the following tasks: Automatic Speech Recognition (ASR), Text-To-Speech synthesis (TTS), and spoken dialect identification. In our experiments comparing ArTST with SpeechT5, as well as with previously reported results in these tasks, ArTST performs on a par with or exceeding the current state-of-the-art in all three tasks. Moreover, we find that our pre-training is conducive for generalization, which is particularly evident in the low-resource TTS task. The pre-trained model as well as the fine-tuned ASR and TTS models are released for research use.
UR - http://www.scopus.com/inward/record.url?scp=85184519259&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85184519259&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85184519259
T3 - ArabicNLP 2023 - 1st Arabic Natural Language Processing Conference, Proceedings
SP - 41
EP - 51
BT - ArabicNLP 2023 - 1st Arabic Natural Language Processing Conference, Porceedings
A2 - Sawaf, Hassan
A2 - El-Beltagy, Samhaa
A2 - Zaghouani, Wajdi
A2 - Magdy, Walid
A2 - Tomeh, Nadi
A2 - Abu Farha, Ibrahim
A2 - Habash, Nizar
A2 - Khalifa, Salam
A2 - Keleg, Amr
A2 - Haddad, Hatem
A2 - Zitouni, Imed
A2 - Abdelali, Ahmed
A2 - Mrini, Khalil
A2 - Almatham, Rawan
PB - Association for Computational Linguistics (ACL)
Y2 - 7 December 2023
ER -