Abstract
Speech emotion recognition (SER) is an active research field of digital signal processing and plays a crucial role in numerous applications of Human–computer interaction (HCI). Nowadays, the baseline state of the art systems has quite a low accuracy and high computations, which needs upgrading to make it reasonable for real-time industrial uses such as detection of content from speech data. The main intent for low recognition rate and high computational cost is a scarceness of datasets, model configuration, and patterns recognition that is the supreme stimulating work for building a robust SER system. In this study, we address these problems and propose a simple and lightweight deep learning-based self-attention module (SAM) for SER system. The transitional features map is given to SAM, which produces efficiently the channel and spatial axes attention map with insignificant overheads. We use a multi-layer perceptron (MLP) in channel attention to extracting global cues and a special dilated convolutional neural network (CNN) in spatial attention to extract spatial info from input tensor. Moreover, we merge, spatial and channel attention maps to produce a combine attention weights as a self-attention module. We placed SAM in the middle of convolutional and connected layers and trained it in an end-to-end mode. The ablation study and comprehensive experimentations are accompanied over IEMOCAP, RAVDESS, and EMO-DB speech emotion datasets. The proposed SER system shows consistent improvements in overall experiments for all datasets and shows 78.01%, 80.00%, and 93.00% average recall, respectively.
| Original language | English |
|---|---|
| Article number | 107101 |
| Journal | Applied Soft Computing |
| Volume | 102 |
| DOIs | |
| Publication status | Published - Apr 2021 |
| Externally published | Yes |
Keywords
- Affective computing
- Artificial intelligence
- Attention mechanism
- Emotion recognition
- Lightweight CNN
- Self-attention module
- Spectrograms
ASJC Scopus subject areas
- Software
Fingerprint
Dive into the research topics of 'Att-Net: Enhanced emotion recognition system using lightweight self-attention module'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS