Data fusion by joint non-negative matrix factorization for hypothesizing pseudo-chemistry using Bayesian networks

Anjana Puliyanda, Kaushik Sivaramakrishnan, Zukui Li, Arno De Klerk, Vinay Prasad

Research output: Contribution to journalArticlepeer-review

11 Citations (Scopus)


Inferring the reaction pathways underlying the processing of complex feeds, using noisy data from spectral sensors that may contain information regarding molecular mechanisms, is challenging. This is tackled by a two-step approach for the partial upgrading of Cold Lake bitumen: first, joint non-negative matrix factorization (JNMF) is used as a data fusion algorithm to extract pseudocomponent spectra by combining complementary information about the reacting environment from Fourier transform infrared (FTIR) and proton nuclear magnetic resonance (1H-NMR) spectroscopic sensors. Second, a probabilistic inferential model that hypothesizes reaction mechanisms among the identified pseudocomponent spectra is constructed using Bayesian networks that encode directed acyclic causal pathways among the nodes of the random variables (pseudocomponent spectra). The JNMF algorithm has been developed to handle process data artefacts by imputing missing data, using a rotationally invariant norm for robustness to outliers and noise, and enforcing the non-negativity constraint to ensure physical interpretability in compliance with Beer's law for spectral data. The projected optimal gradient approach developed to solve the JNMF objective converges within fewer iterations at the specified tolerance as compared to the multiplicative update rules (MUR). Solution ambiguity in JNMF is limited by incorporating graph regularization terms: (a) inter-sensor co-regularization that penalizes redundancy in the pseudocomponent spectra across spectral sensors, and (b) intra-spectral manifold regularization that penalizes overfitting of the pseudocomponent spectra from each sensor by penalizing redundant peaks within a spectrum. Weighting the intra-spectral regularization term that minimizes similarly correlated peaks across spectral channels of a sensor to zero is seen to result in chemically meaningful pseudocomponent spectra, given that different organic compounds share similar properties with respect to their hydrocarbon structure. Hence, the preferential weighting of regularizers is shown to act as a chemical information sieve by controlling the peaks that appear in the pseudocomponent spectra, thereby enabling the proposal of different reaction mechanisms, based on the similarity metric used to model the graph structure.

Original languageEnglish
Pages (from-to)1719-1737
Number of pages19
JournalReaction Chemistry and Engineering
Issue number9
Publication statusPublished - Sept 2020
Externally publishedYes

ASJC Scopus subject areas

  • Catalysis
  • Chemistry (miscellaneous)
  • Chemical Engineering (miscellaneous)
  • Process Chemistry and Technology
  • Fluid Flow and Transfer Processes


Dive into the research topics of 'Data fusion by joint non-negative matrix factorization for hypothesizing pseudo-chemistry using Bayesian networks'. Together they form a unique fingerprint.

Cite this