Deep Learning-Based Context-Aware Video Content Analysis on IoT Devices

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

Integrating machine learning with the Internet of Things (IoT) enables many useful applica-tions. For IoT applications that incorporate video content analysis (VCA), deep learning models are usually used due to their capacity to encode the high-dimensional spatial and temporal representations of videos. However, limited energy and computation resources present a major challenge. Video captioning is one type of VCA that describes a video with a sentence or a set of sentences. This work proposes an IoT-based deep learning-based framework for video captioning that can (1) Mine large open-domain video-to-text datasets to extract video-caption pairs that belong to a particular domain. (2) Preprocess the selected video-caption pairs including reducing the complexity of the captions’ language model to improve performance. (3) Propose two deep learning models: A transformer-based model and an LSTM-based model. Hyperparameter tuning is performed to select the best hyperparameters. Models are evaluated in terms of accuracy and inference time on different platforms. The presented framework generates captions in standard sentence templates to facilitate extracting information in later stages of the analysis. The two developed deep learning models offer a trade-off between accuracy and speed. While the transformer-based model yields a high accuracy of 97%, the LSTM-based model achieves near real-time inference.

Original languageEnglish
Article number1785
JournalElectronics (Switzerland)
Volume11
Issue number11
DOIs
Publication statusPublished - Jun 1 2022
Externally publishedYes

Keywords

  • Internet of Things (IoT)
  • LSTM
  • transformer-based model
  • video captioning
  • video content analysis

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Signal Processing
  • Hardware and Architecture
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Deep Learning-Based Context-Aware Video Content Analysis on IoT Devices'. Together they form a unique fingerprint.

Cite this