TC-Net: A Modest & Lightweight Emotion Recognition System Using Temporal Convolution Network

Research output: Contribution to journalArticlepeer-review

21 Citations (Scopus)

Abstract

Speech signals play an essential role in communication and provide an efficient way to exchange information between humans and machines. Speech Emotion Recognition (SER) is one of the critical sources for human evaluation, which is applicable in many real-world applications such as healthcare, call centers, robotics, safety, and virtual reality. This work developed a novel TCN-based emotion recognition system using speech signals through a spatial-temporal convolution network to recognize the speaker's emotional state. The authors designed a Temporal Convolutional Network (TCN) core block to recognize long-term dependencies in speech signals and then feed these temporal cues to a dense network to fuse the spatial features and recognize global information for final classification. The proposed network extracts valid sequential cues automatically from speech signals, which performed better than state-of-the-art (SOTA) and traditional machine learning algorithms. Results of the proposed method show a high recognition rate compared with SOTAmethods. The final unweighted accuracy of 80.84%, and 92.31%, for interactive emotional dyadic motion captures (IEMOCAP) and berlin emotional dataset (EMO-DB), indicate the robustness and efficiency of the designed model.

Original languageEnglish
Pages (from-to)3355-3369
Number of pages15
JournalComputer Systems Science and Engineering
Volume46
Issue number3
DOIs
Publication statusPublished - 2023
Externally publishedYes

Keywords

  • Affective computing
  • deep learning
  • emotion recognition
  • speech signal
  • temporal convolutional network

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'TC-Net: A Modest & Lightweight Emotion Recognition System Using Temporal Convolution Network'. Together they form a unique fingerprint.

Cite this