TY - JOUR
T1 - Iterative sequential monte carlo algorithm for motif discovery
AU - Bataineh, Mohammad Al
AU - Al-qudah, Zouhair
AU - Al-Zaben, Awad
N1 - Publisher Copyright:
© The Institution of Engineering and Technology 2016.
PY - 2016/7/1
Y1 - 2016/7/1
N2 - The discovery of motifs in transcription factor binding sites is important in the transcription process, and is crucial for understanding the gene regulatory relationship and evolution history. Identifying weak motifs and reducing the effect of local optima, error propagation and computational complexity are still important, but challenging tasks for motif discovery. This study proposes an iterative sequential Monte Carlo (ISMC) motif discovery algorithm based on the position weight matrix and the Gibbs sampling model to locate conserved motifs in a given set of nucleotide sequences. Three sub-algorithms have been proposed. Algorithm 1 (see Fig. 1) deals with the case of one motif instance of fixed length in each nucleotide sequence. Furthermore, the proposed ISMC algorithm is extended to deal with more complex situations including unique motif of unknown length in Algorithm 2, unique motif with unknown abundance in Algorithm 3 (see Fig. 2) and multiple motifs. Experimental results over both synthetic and real datasets show that the proposed ISMC algorithm outperforms five other widely used motif discovery algorithms in terms of nucleotide and site-level sensitivity, nucleotide and site-level positive prediction value, nucleotide-level performance coefficient, nucleotide-level correlation coefficient and site-level average site performance.
AB - The discovery of motifs in transcription factor binding sites is important in the transcription process, and is crucial for understanding the gene regulatory relationship and evolution history. Identifying weak motifs and reducing the effect of local optima, error propagation and computational complexity are still important, but challenging tasks for motif discovery. This study proposes an iterative sequential Monte Carlo (ISMC) motif discovery algorithm based on the position weight matrix and the Gibbs sampling model to locate conserved motifs in a given set of nucleotide sequences. Three sub-algorithms have been proposed. Algorithm 1 (see Fig. 1) deals with the case of one motif instance of fixed length in each nucleotide sequence. Furthermore, the proposed ISMC algorithm is extended to deal with more complex situations including unique motif of unknown length in Algorithm 2, unique motif with unknown abundance in Algorithm 3 (see Fig. 2) and multiple motifs. Experimental results over both synthetic and real datasets show that the proposed ISMC algorithm outperforms five other widely used motif discovery algorithms in terms of nucleotide and site-level sensitivity, nucleotide and site-level positive prediction value, nucleotide-level performance coefficient, nucleotide-level correlation coefficient and site-level average site performance.
UR - http://www.scopus.com/inward/record.url?scp=84974577720&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84974577720&partnerID=8YFLogxK
U2 - 10.1049/iet-spr.2014.0356
DO - 10.1049/iet-spr.2014.0356
M3 - Article
AN - SCOPUS:84974577720
SN - 1751-9675
VL - 10
SP - 504
EP - 513
JO - IET Signal Processing
JF - IET Signal Processing
IS - 5
ER -