TY - JOUR
T1 - Beyond N-Grams
T2 - Enhancing String Kernels With Transformer-Guided Semantic Insights
AU - Zaki, Nazar
AU - Alderei, Reem
AU - Alketbi, Mahra
AU - Alkaabi, Alia
AU - Alneyadi, Fatima
AU - Zaki, Nadeen
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025
Y1 - 2025
N2 - The rapid advancements in large language models (LLMs) have led to the generation of sophisticated AI-produced texts, posing significant challenges in distinguishing machine-generated content from authentic human writing. This study presents a novel hybrid framework that effectively integrates string kernel approaches with deep contextual embeddings from state-of-the-art transformers for robust AI-generated text detection. We propose and evaluate four innovative kernel-based methods namely Attention-Augmented Kernel, Error Pattern Analysis, Transformer-Guided N-gram Selection, and a Custom Kernel Function each designed to uniquely capture semantic and structural distinctions of text. Extensive experiments conducted on eight diverse datasets comprising 2,501 total samples, featuring texts generated and enhanced by leading LLMs including GPT-3.5, GPT-4, DeepSeek, and KIMI, demonstrate superior performance of the proposed methods. Particularly, the Transformer-Guided N-gram Selection and the Custom Kernel Function consistently outperform baseline models, achieving near-perfect detection accuracy with significantly reduced computational complexity. Comprehensive hyperparameter optimization further solidifies our methods’ effectiveness and practical applicability. The publicly available datasets and robust empirical evaluations contribute valuable benchmarks for future research. This work sets a new standard in AI-text detection methodologies, enhancing reliability, efficiency, and scalability for real-world applications.
AB - The rapid advancements in large language models (LLMs) have led to the generation of sophisticated AI-produced texts, posing significant challenges in distinguishing machine-generated content from authentic human writing. This study presents a novel hybrid framework that effectively integrates string kernel approaches with deep contextual embeddings from state-of-the-art transformers for robust AI-generated text detection. We propose and evaluate four innovative kernel-based methods namely Attention-Augmented Kernel, Error Pattern Analysis, Transformer-Guided N-gram Selection, and a Custom Kernel Function each designed to uniquely capture semantic and structural distinctions of text. Extensive experiments conducted on eight diverse datasets comprising 2,501 total samples, featuring texts generated and enhanced by leading LLMs including GPT-3.5, GPT-4, DeepSeek, and KIMI, demonstrate superior performance of the proposed methods. Particularly, the Transformer-Guided N-gram Selection and the Custom Kernel Function consistently outperform baseline models, achieving near-perfect detection accuracy with significantly reduced computational complexity. Comprehensive hyperparameter optimization further solidifies our methods’ effectiveness and practical applicability. The publicly available datasets and robust empirical evaluations contribute valuable benchmarks for future research. This work sets a new standard in AI-text detection methodologies, enhancing reliability, efficiency, and scalability for real-world applications.
KW - String kernels
KW - hybrid models
KW - semantic analysis
KW - text classification
KW - transformer embeddings
UR - http://www.scopus.com/inward/record.url?scp=105007296163&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105007296163&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2025.3576076
DO - 10.1109/ACCESS.2025.3576076
M3 - Article
AN - SCOPUS:105007296163
SN - 2169-3536
VL - 13
SP - 97779
EP - 97793
JO - IEEE Access
JF - IEEE Access
ER -