TY - GEN
T1 - A morphologically annotated corpus of Emirati Arabic
AU - Khalifa, Salam
AU - Habash, Nizar
AU - Eryani, Fadhl
AU - Obeid, Ossama
AU - Abdulrahim, Dana
AU - Kaabi, Meera Al
N1 - Funding Information:
This project is funded by a New York University Abu Dhabi Research Enhancement Fund. We would like to thank Ramy Eskander and the team of annotators at Ramitechs. We also thank the creators of MADARi from the MADAR project. Finally we thank Sondos Krouna for insightful discussions on POS decisions.
Publisher Copyright:
© LREC 2018 - 11th International Conference on Language Resources and Evaluation. All rights reserved.
PY - 2019
Y1 - 2019
N2 - We present an ongoing effort on the first large-scale morphologically manually annotated corpus of Emirati Arabic. This corpus includes about 200,000 words selected from eight Gumar corpus novels in the Emirati Arabic variety. The selected texts are being annotated for tokenization, part-of-speech, lemmatization, English glosses and dialect identification. The orthography of the text is also adjusted for errors and inconsistencies. We discuss the guidelines for each part of the annotation components, and the annotation interface we use. We report on the quality of the annotation through an inter-annotator agreement measure.
AB - We present an ongoing effort on the first large-scale morphologically manually annotated corpus of Emirati Arabic. This corpus includes about 200,000 words selected from eight Gumar corpus novels in the Emirati Arabic variety. The selected texts are being annotated for tokenization, part-of-speech, lemmatization, English glosses and dialect identification. The orthography of the text is also adjusted for errors and inconsistencies. We discuss the guidelines for each part of the annotation components, and the annotation interface we use. We report on the quality of the annotation through an inter-annotator agreement measure.
KW - Annotation
KW - Gulf Arabic
KW - Morphology
KW - Part-of-Speech Tagging
UR - http://www.scopus.com/inward/record.url?scp=85059877656&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85059877656&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85059877656
T3 - LREC 2018 - 11th International Conference on Language Resources and Evaluation
SP - 3839
EP - 3846
BT - LREC 2018 - 11th International Conference on Language Resources and Evaluation
A2 - Isahara, Hitoshi
A2 - Maegaard, Bente
A2 - Piperidis, Stelios
A2 - Cieri, Christopher
A2 - Declerck, Thierry
A2 - Hasida, Koiti
A2 - Mazo, Helene
A2 - Choukri, Khalid
A2 - Goggi, Sara
A2 - Mariani, Joseph
A2 - Moreno, Asuncion
A2 - Calzolari, Nicoletta
A2 - Odijk, Jan
A2 - Tokunaga, Takenobu
PB - European Language Resources Association (ELRA)
T2 - 11th International Conference on Language Resources and Evaluation, LREC 2018
Y2 - 7 May 2018 through 12 May 2018
ER -