TY - GEN
T1 - Identification of Coding Regions in Prokaryotic DNA Sequences Using Bayesian Classification
AU - Al Bataineh, Mohammad
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2020
Y1 - 2020
N2 - The identification of protein-coding regions in genomic DNA sequences is a well-known problem in computational genomics. Various computational algorithms can be employed to achieve the identification process. The rapid advances in this field have motivated the development of innovative engineering methods that allow for further analysis and modeling of many processes in molecular biology. The proposed algorithm utilizes well-known concepts in communications theory, such as correlation, the maximal ratio combining (MRC) algorithm, and filtering techniques to create a signal whose maxima and minima indicate coding and noncoding regions, respectively. The proposed algorithm investigates several prokaryotic genome sequences. Two Bayesian classifiers are designed to test and evaluate the performance of the proposed algorithm. The obtained simulation results prove that the algorithm can efficiently and accurately detect protein-coding regions, which is being demonstrated by the obtained sensitivity and specificity values that are comparable to well-known gene detection methods in prokaryotes. The obtained results further verify the correctness and the biological relevance of using communications theory concepts for genomic sequence analysis.
AB - The identification of protein-coding regions in genomic DNA sequences is a well-known problem in computational genomics. Various computational algorithms can be employed to achieve the identification process. The rapid advances in this field have motivated the development of innovative engineering methods that allow for further analysis and modeling of many processes in molecular biology. The proposed algorithm utilizes well-known concepts in communications theory, such as correlation, the maximal ratio combining (MRC) algorithm, and filtering techniques to create a signal whose maxima and minima indicate coding and noncoding regions, respectively. The proposed algorithm investigates several prokaryotic genome sequences. Two Bayesian classifiers are designed to test and evaluate the performance of the proposed algorithm. The obtained simulation results prove that the algorithm can efficiently and accurately detect protein-coding regions, which is being demonstrated by the obtained sensitivity and specificity values that are comparable to well-known gene detection methods in prokaryotes. The obtained results further verify the correctness and the biological relevance of using communications theory concepts for genomic sequence analysis.
KW - Bayesian classification
KW - Correlation
KW - Gene identification
KW - Maximal ratio combining
KW - Period-3 filter
UR - http://www.scopus.com/inward/record.url?scp=85085210611&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85085210611&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-45385-5_1
DO - 10.1007/978-3-030-45385-5_1
M3 - Conference contribution
AN - SCOPUS:85085210611
SN - 9783030453848
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 3
EP - 14
BT - Bioinformatics and Biomedical Engineering - 8th International Work-Conference, IWBBIO 2020, Proceedings
A2 - Rojas, Ignacio
A2 - Valenzuela, Olga
A2 - Rojas, Fernando
A2 - Herrera, Luis Javier
A2 - Ortuño, Francisco
PB - Springer
T2 - 8th International Work-Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2020
Y2 - 6 May 2020 through 8 May 2020
ER -