Grouped Echo State Network with Late Fusion for Speech Emotion Recognition

Hemin Ibrahim, Chu Kiong Loo, Fady Alnajjar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)


Speech Emotion Recognition (SER) has become a popular research topic due to having a significant role in many practical applications and is considered a key effort in Human-Computer Interaction (HCI). Previous works in this field have mostly focused on global features or time series feature representation with deep learning models. However, the main focus of this work is to design a simple model for SER by adopting multivariate time series feature representation. This work also used the Echo State Network (ESN) including parallel reservoir layers as a special case of the Recurrent Neural Network (RNN) and applied Principal Component Analysis (PCA) to reduce the high dimension output from reservoir layers. The late grouped fusion has been applied to capture additional information independently of the two reservoirs. Additionally, hyperparameters have been optimized by using the Bayesian approach. The high performance of the proposed SER model is proved when adopting the speaker-independent experiments on the SAVEE dataset and FAU Aibo emotion Corpus. The experimental results show that the designed model is superior to the state-of-the-art results.

Original languageEnglish
Title of host publicationNeural Information Processing - 28th International Conference, ICONIP 2021, Proceedings
EditorsTeddy Mantoro, Minho Lee, Media Anugerah Ayu, Kok Wai Wong, Achmad Nizar Hidayanto
PublisherSpringer Science and Business Media Deutschland GmbH
Number of pages12
ISBN (Print)9783030922375
Publication statusPublished - 2021
Event28th International Conference on Neural Information Processing, ICONIP 2021 - Virtual, Online
Duration: Dec 8 2021Dec 12 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13110 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference28th International Conference on Neural Information Processing, ICONIP 2021
CityVirtual, Online


  • Grouped echo state network
  • Recurrent neural network
  • Reservoir computing
  • Speech emotion recognition
  • Time series classification

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Grouped Echo State Network with Late Fusion for Speech Emotion Recognition'. Together they form a unique fingerprint.

Cite this