Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights

Nazar Zaki, Reem Alderei, Mahra Alketbi, Alia Alkaabi, Fatima Alneyadi, Nadeen Zaki

Research output: Contribution to journalArticlepeer-review

Abstract

The rapid advancements in large language models (LLMs) have led to the generation of sophisticated AI-produced texts, posing significant challenges in distinguishing machine-generated content from authentic human writing. This study presents a novel hybrid framework that effectively integrates string kernel approaches with deep contextual embeddings from state-of-the-art transformers for robust AI-generated text detection. We propose and evaluate four innovative kernel-based methods namely Attention-Augmented Kernel, Error Pattern Analysis, Transformer-Guided N-gram Selection, and a Custom Kernel Function each designed to uniquely capture semantic and structural distinctions of text. Extensive experiments conducted on eight diverse datasets comprising 2,501 total samples, featuring texts generated and enhanced by leading LLMs including GPT-3.5, GPT-4, DeepSeek, and KIMI, demonstrate superior performance of the proposed methods. Particularly, the Transformer-Guided N-gram Selection and the Custom Kernel Function consistently outperform baseline models, achieving near-perfect detection accuracy with significantly reduced computational complexity. Comprehensive hyperparameter optimization further solidifies our methods’ effectiveness and practical applicability. The publicly available datasets and robust empirical evaluations contribute valuable benchmarks for future research. This work sets a new standard in AI-text detection methodologies, enhancing reliability, efficiency, and scalability for real-world applications.

Original languageEnglish
Pages (from-to)97779-97793
Number of pages15
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 2025

Keywords

  • String kernels
  • hybrid models
  • semantic analysis
  • text classification
  • transformer embeddings

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights'. Together they form a unique fingerprint.

Cite this