TY - JOUR
T1 - Investigating the effects of imputation methods for modelling gene networks using a dynamic Bayesian network from gene expression data
AU - Chai, Lian En
AU - Law, Chow Kuan
AU - Mohamad, Mohd Saberi
AU - Chong, Chuii Khim
AU - Choon, Yee Wen
AU - Deris, Safaai
AU - Illias, Rosli Md
PY - 2014/3
Y1 - 2014/3
N2 - Background: Gene expression data often contain missing expression values. Therefore, several imputation methods have been applied to solve the missing values, which include k-nearest neighbour (kNN), local least squares (LLS), and Bayesian principal component analysis (BPCA). However, the effects of these imputation methods on the modelling of gene regulatory networks from gene expression data have rarely been investigated and analysed using a dynamic Bayesian network (DBN). Methods: In the present study, we separately imputed datasets of the Escherichia coli S.O.S. DNA repair pathway and the Saccharomyces cerevisiae cell cycle pathway with kNN, LLS, and BPCA, and subsequently used these to generate gene regulatory networks (GRNs) using a discrete DBN. We made comparisons on the basis of previous studies in order to select the gene network with the least error. Results: We found that BPCA and LLS performed better on larger networks (based on the S. cerevisiae dataset), whereas kNN performed better on smaller networks (based on the E. coli dataset). Conclusion: The results suggest that the performance of each imputation method is dependent on the size of the dataset, and this subsequently affects the modelling of the resultant GRNs using a DBN. In addition, on the basis of these results, a DBN has the capacity to discover potential edges, as well as display interactions, between genes.
AB - Background: Gene expression data often contain missing expression values. Therefore, several imputation methods have been applied to solve the missing values, which include k-nearest neighbour (kNN), local least squares (LLS), and Bayesian principal component analysis (BPCA). However, the effects of these imputation methods on the modelling of gene regulatory networks from gene expression data have rarely been investigated and analysed using a dynamic Bayesian network (DBN). Methods: In the present study, we separately imputed datasets of the Escherichia coli S.O.S. DNA repair pathway and the Saccharomyces cerevisiae cell cycle pathway with kNN, LLS, and BPCA, and subsequently used these to generate gene regulatory networks (GRNs) using a discrete DBN. We made comparisons on the basis of previous studies in order to select the gene network with the least error. Results: We found that BPCA and LLS performed better on larger networks (based on the S. cerevisiae dataset), whereas kNN performed better on smaller networks (based on the E. coli dataset). Conclusion: The results suggest that the performance of each imputation method is dependent on the size of the dataset, and this subsequently affects the modelling of the resultant GRNs using a DBN. In addition, on the basis of these results, a DBN has the capacity to discover potential edges, as well as display interactions, between genes.
KW - Bayesian method
KW - DNA microarrays
KW - Gene expression
KW - Gene expression regulation
KW - Gene regulatory networks
UR - http://www.scopus.com/inward/record.url?scp=84902480570&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84902480570&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:84902480570
SN - 1394-195X
VL - 21
SP - 20
EP - 27
JO - Malaysian Journal of Medical Sciences
JF - Malaysian Journal of Medical Sciences
IS - 2
ER -