Application of string kernels in protein sequence classification

Nazar M. Zaki, Safaai Deris, Rosli Illias

Research output: Contribution to journalArticlepeer-review

18 Citations (Scopus)


Introduction: The production of biological information has become much greater than its consumption. The key issue now is how to organise and manage the huge amount of novel information to facilitate access to this useful and important biological information. One core problem in classifying biological information is the annotation of new protein sequences with structural and functional features. Method: This article introduces the application of string kernels in classifying protein sequences into homogeneous families. A string kernel approach used in conjunction with support vector machines has been shown to achieve good performance in text categorisation tasks. We evaluated and analysed the performance of this approach, and we present experimental results on three selected families from the SCOP (Structural Classification of Proteins) database. We then compared the overall performance of this method with the existing protein classification methods on benchmark SCOP datasets. Results: According to the F1 performance measure and the rate of false positive (RFP) measure, the string kernel method performs well in classifying protein sequences. The method outperformed all the generative-based methods and is comparable with the SVM-Fisher method. Discussion: Although the string kernel approach makes no use of prior biological knowledge, it still captures sufficient biological information to enable it to outperform some of the state-of-the-art methods.

Original languageEnglish
Pages (from-to)45-52
Number of pages8
JournalApplied Bioinformatics
Issue number1
Publication statusPublished - 2005

ASJC Scopus subject areas

  • Information Systems
  • General Agricultural and Biological Sciences
  • Computer Science Applications


Dive into the research topics of 'Application of string kernels in protein sequence classification'. Together they form a unique fingerprint.

Cite this