TY - JOUR
T1 - NAHAL-Flex
T2 - A Numerical and Alphabetical Hinge Detection Algorithm for Flexible Protein Structure Alignment
AU - Fotoohifiroozabadi, Samira
AU - Mohamad, Mohd Saberi
AU - Deris, Safaai
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2018/5/1
Y1 - 2018/5/1
N2 - Flexible proteins are proteins that have conformational changes in their structures. Protein flexibility analysis is critical for classifying and understanding protein functionality. For that analysis, the hinge areas where proteins show flexibility must be detected. To detect the location of the hinges, previous methods have utilized the three-dimensional (3D) structure of proteins, which is highly computational. To reduce the computational complexity, this study proposes a novel text-based method using structural alphabets (SAs) for detecting the hinge position, called NAHAL-Flex. Protein structures were encoded to a particular type of SA called the protein folding shape code (PFSC), which remains unaffected by location, scale, and rotation. The flexible regions of the proteins are the only places in which letter sequences can be distorted. With this knowledge, it is possible to find the longest alignment path of two letter sequences using a dynamic programming (DP) algorithm. Then, the proposed method looks for regions where the alphabet sequence is distorted to find the most probable hinge positions. In order to reduce the number of hinge positions, a genetic algorithm (GA) was utilized to find the best candidate hinge points. To evaluate the method's effectiveness, four different flexible and rigid protein databases, including two small datasets and two large datasets, were utilized. For the small dataset, the NAHAL-Flex method was comparable to state-of-the-art structural flexible alignment methods. The result for the large datasets show that NAHAL-Flex outperforms some well-known alignment methods, e.g., DaliLite, Matt, DeepAlign, and TM-align; the speed of NAHAL-Flex was faster and its result was more accurate than the other methods.
AB - Flexible proteins are proteins that have conformational changes in their structures. Protein flexibility analysis is critical for classifying and understanding protein functionality. For that analysis, the hinge areas where proteins show flexibility must be detected. To detect the location of the hinges, previous methods have utilized the three-dimensional (3D) structure of proteins, which is highly computational. To reduce the computational complexity, this study proposes a novel text-based method using structural alphabets (SAs) for detecting the hinge position, called NAHAL-Flex. Protein structures were encoded to a particular type of SA called the protein folding shape code (PFSC), which remains unaffected by location, scale, and rotation. The flexible regions of the proteins are the only places in which letter sequences can be distorted. With this knowledge, it is possible to find the longest alignment path of two letter sequences using a dynamic programming (DP) algorithm. Then, the proposed method looks for regions where the alphabet sequence is distorted to find the most probable hinge positions. In order to reduce the number of hinge positions, a genetic algorithm (GA) was utilized to find the best candidate hinge points. To evaluate the method's effectiveness, four different flexible and rigid protein databases, including two small datasets and two large datasets, were utilized. For the small dataset, the NAHAL-Flex method was comparable to state-of-the-art structural flexible alignment methods. The result for the large datasets show that NAHAL-Flex outperforms some well-known alignment methods, e.g., DaliLite, Matt, DeepAlign, and TM-align; the speed of NAHAL-Flex was faster and its result was more accurate than the other methods.
KW - linear alignment
KW - protein flexibility
KW - Protein structure alignment
KW - structure comparison
UR - http://www.scopus.com/inward/record.url?scp=85048311197&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048311197&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2017.2705080
DO - 10.1109/TCBB.2017.2705080
M3 - Article
C2 - 28534783
AN - SCOPUS:85048311197
SN - 1545-5963
VL - 15
SP - 934
EP - 943
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 3
ER -