Abstract
This article presents SUBTLEX-AR, a digital database providing an extensive collection of attributes related to Modern Standard Arabic words (Arabic for short). SUBTLEX-AR combines a novel dataset of 120 million word tokens from movie subtitles with 40 million tokens from newspaper articles originally collected in ARALEX (Boudelaa & Marslen-Wilson, Behavior Research Methods,42, 481–487, 2010), ensuring comprehensive coverage. SUBTLEX-AR provides information about the statistical properties of Arabic words at the orthographic, phonological, morphological, and semantic levels. The database also includes information on sub-word structure properties like bigram and trigram frequencies, as well as lemmas and part-of-speech information along with their corresponding frequencies. The online interface of SUBTLEX-AR allows users either to upload a set of words to receive their properties or to receive a set of words matching constraints on predefined properties. The properties themselves are easily extensible and will be expanded over time. SUBTLEX-AR is freely accessible here: https://subtlexar.uaeu.ac.ae/
Original language | English |
---|---|
Article number | 104 |
Journal | Behavior Research Methods |
Volume | 57 |
Issue number | 4 |
DOIs | |
Publication status | Published - Apr 2025 |
Keywords
- Arabic
- Morpheme frequency
- Semantic similarity
- Subtitles
- Word frequency
ASJC Scopus subject areas
- Experimental and Cognitive Psychology
- Developmental and Educational Psychology
- Arts and Humanities (miscellaneous)
- Psychology (miscellaneous)
- General Psychology