TY - GEN
T1 - Towards Optimized IoT-based Context-aware Video Content Analysis Framework
AU - Gad, Gad
AU - Gad, Eyad
AU - Mokhtar, Bassem
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/6/14
Y1 - 2021/6/14
N2 - Despite the success of convolutional neural networks (CNNs) in the area of spatial analysis, and recurrent neural networks (RNNs) on sequence modeling and interpretation tasks, video analysis has only seen limited interest and progress. This is partially due to focusing on the natural humanlike translation from video space to natural language space to the detriment of informativeness. This paper is proposing an automated context-aware video analysis framework that is directed by the constrains of its application. This framework encorporates an encoder-decoder neural network trained on a closed-domain video-to-text dataset. The network architecture and the standardized language model present in the dataset are optimized for speed, to allow the system to be applied on IoT devices, and for informativeness, to extract information easily from the model output to the following stages of the anlaysis. The proposed framework provides a practical method to integrate the power of CNN and RNN combination in a directed way to extract the most from video content. A classroom monitoring system is discussed as an example of the capabilities and limitations of the proposed framework using NVIDIA's Jetson nano board.
AB - Despite the success of convolutional neural networks (CNNs) in the area of spatial analysis, and recurrent neural networks (RNNs) on sequence modeling and interpretation tasks, video analysis has only seen limited interest and progress. This is partially due to focusing on the natural humanlike translation from video space to natural language space to the detriment of informativeness. This paper is proposing an automated context-aware video analysis framework that is directed by the constrains of its application. This framework encorporates an encoder-decoder neural network trained on a closed-domain video-to-text dataset. The network architecture and the standardized language model present in the dataset are optimized for speed, to allow the system to be applied on IoT devices, and for informativeness, to extract information easily from the model output to the following stages of the anlaysis. The proposed framework provides a practical method to integrate the power of CNN and RNN combination in a directed way to extract the most from video content. A classroom monitoring system is discussed as an example of the capabilities and limitations of the proposed framework using NVIDIA's Jetson nano board.
KW - Closed-domain Dataset
KW - Context-aware Analysis
KW - Deep Learning Framework
KW - Subject-Verb-Object Description
KW - Video Description
UR - http://www.scopus.com/inward/record.url?scp=85119855660&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119855660&partnerID=8YFLogxK
U2 - 10.1109/WF-IoT51360.2021.9595891
DO - 10.1109/WF-IoT51360.2021.9595891
M3 - Conference contribution
AN - SCOPUS:85119855660
T3 - 7th IEEE World Forum on Internet of Things, WF-IoT 2021
SP - 46
EP - 50
BT - 7th IEEE World Forum on Internet of Things, WF-IoT 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 7th IEEE World Forum on Internet of Things, WF-IoT 2021
Y2 - 14 June 2021 through 31 July 2021
ER -