TY - GEN
T1 - Improving the Detection of Protein Complexes by Predicting Novel Missing Interactome Links in the Protein-Protein Interaction Network
AU - Zaki, Nazar
AU - Alashwal, Hany
PY - 2018/10/26
Y1 - 2018/10/26
N2 - Identifying protein complexes within a protein-protein interaction (PPI) networks is a crucial task in computational biology that helps to facilitate a better understanding of the cellular mechanisms it is possible to observe in various organisms. Datasets of predicted PPIs have been determined using high-throughput experimental technology. However, the datasets typically contain many spurious interactions. It is essential that these interactions, observed in the given datasets, are validated before they are employed to predict protein complexes. This paper describes the identification of missing interactome links in the PPI network as a way of improving the detection of protein complexes. The missing links have been identified by extracting several topological features. These are subsequently employed in conjunction with a two-class boosted decision-tree classifier to develop a machine-learning model that is capable of distinguishing between existing and non-existing interactome links. The model was trained on a PPI network that consisted of 1,622 proteins and 9,074 interactions, then tested on another PPI network that consisted of 1,430 proteins and 6,531 interactions. All 6,531 interactions were identified with a precision of 0.994 and a recall of 1. The model was also able to detect 37 novel interactions that were then validated using a STRING database of known and predicted PPIs. The detection of the protein complexes using CIusterONE was improved by the inclusion of the 37 novel interactions.
AB - Identifying protein complexes within a protein-protein interaction (PPI) networks is a crucial task in computational biology that helps to facilitate a better understanding of the cellular mechanisms it is possible to observe in various organisms. Datasets of predicted PPIs have been determined using high-throughput experimental technology. However, the datasets typically contain many spurious interactions. It is essential that these interactions, observed in the given datasets, are validated before they are employed to predict protein complexes. This paper describes the identification of missing interactome links in the PPI network as a way of improving the detection of protein complexes. The missing links have been identified by extracting several topological features. These are subsequently employed in conjunction with a two-class boosted decision-tree classifier to develop a machine-learning model that is capable of distinguishing between existing and non-existing interactome links. The model was trained on a PPI network that consisted of 1,622 proteins and 9,074 interactions, then tested on another PPI network that consisted of 1,430 proteins and 6,531 interactions. All 6,531 interactions were identified with a precision of 0.994 and a recall of 1. The model was also able to detect 37 novel interactions that were then validated using a STRING database of known and predicted PPIs. The detection of the protein complexes using CIusterONE was improved by the inclusion of the 37 novel interactions.
UR - http://www.scopus.com/inward/record.url?scp=85056645928&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85056645928&partnerID=8YFLogxK
U2 - 10.1109/EMBC.2018.8513476
DO - 10.1109/EMBC.2018.8513476
M3 - Conference contribution
C2 - 30441473
AN - SCOPUS:85056645928
T3 - Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
SP - 5041
EP - 5044
BT - 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2018
Y2 - 18 July 2018 through 21 July 2018
ER -