TY - GEN
T1 - Robust part-of-speech tagging of arabic text
AU - Aldarmaki, Hanan
AU - Diab, Mona
N1 - Publisher Copyright:
© ACL 2015. All rights reserved.
PY - 2015
Y1 - 2015
N2 - We present a new and improved part of speech tagger for Arabic text that incorporates a set of novel features and constraints. This framework is presented within the MADAMIRA software suite, a state-of-the-art toolkit for Arabic language processing. Starting from a linear SVM model with basic lexical features, we add a range of features derived from morphological analysis and clustering methods. We show that using these features significantly improves part-of-speech tagging accuracy, especially for unseen words, which results in better generalization across genres. The final model, embedded in a sequential tagging framework, achieved 97.15% accuracy on the main test set of newswire data, which is higher than the current MADAMIRA accuracy of 96.91% while being 30% faster.
AB - We present a new and improved part of speech tagger for Arabic text that incorporates a set of novel features and constraints. This framework is presented within the MADAMIRA software suite, a state-of-the-art toolkit for Arabic language processing. Starting from a linear SVM model with basic lexical features, we add a range of features derived from morphological analysis and clustering methods. We show that using these features significantly improves part-of-speech tagging accuracy, especially for unseen words, which results in better generalization across genres. The final model, embedded in a sequential tagging framework, achieved 97.15% accuracy on the main test set of newswire data, which is higher than the current MADAMIRA accuracy of 96.91% while being 30% faster.
UR - http://www.scopus.com/inward/record.url?scp=84992762036&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84992762036&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84992762036
T3 - 2nd Workshop on Arabic Natural Language Processing, ANLP 2015 - held at 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015 - Proceedings
SP - 173
EP - 182
BT - 2nd Workshop on Arabic Natural Language Processing, ANLP 2015 - held at 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015 - Proceedings
A2 - Habash, Nizar
A2 - Vogel, Stephan
A2 - Darwish, Kareem
PB - Association for Computational Linguistics (ACL)
T2 - 2nd Workshop on Arabic Natural Language Processing, ANLP 2015
Y2 - 30 July 2015
ER -