TY - GEN
T1 - Two-Stream Architecture Using RGB-based ConvNet and Pose-based LSTM for Video Action Recognition
AU - Huang, Ching Jung
AU - Gochoo, Munkhjargal
AU - Tan, Tan Hsu
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Traditional methods for video recognition require hand-crafted features, which often involves offline pre-processing for real-world videos. In this study, we propose a conceptually simple framework that directly takes raw videos as an input source for activity recognition. Our framework consists of two streams, namely a spatial stream and a temporal stream. The spatial stream is trained on RepVGG-B0 ConvNet using cropped RGB features, while the temporal stream uses an attention-based Bi-directional Long Short-Term Memory (Bi-LSTM) network to learn posture vectors from human pose data obtained through Faster R-CNN pre-Trained model. Our proposed method is evaluated on a standard video action recognition benchmark, MSR Daily Activity3D, and proves to be competitive with state-of-The-Art action recognition methods. We achieve state-of-The-Art performance on MSR Daily Activity3D with a precision and recall rate of 99.01% and 98.91%, respectively. Our results demonstrate the effectiveness of our approach in recognizing video actions.
AB - Traditional methods for video recognition require hand-crafted features, which often involves offline pre-processing for real-world videos. In this study, we propose a conceptually simple framework that directly takes raw videos as an input source for activity recognition. Our framework consists of two streams, namely a spatial stream and a temporal stream. The spatial stream is trained on RepVGG-B0 ConvNet using cropped RGB features, while the temporal stream uses an attention-based Bi-directional Long Short-Term Memory (Bi-LSTM) network to learn posture vectors from human pose data obtained through Faster R-CNN pre-Trained model. Our proposed method is evaluated on a standard video action recognition benchmark, MSR Daily Activity3D, and proves to be competitive with state-of-The-Art action recognition methods. We achieve state-of-The-Art performance on MSR Daily Activity3D with a precision and recall rate of 99.01% and 98.91%, respectively. Our results demonstrate the effectiveness of our approach in recognizing video actions.
KW - Faster R-CNN
KW - RGB activity images
KW - RepVGG
KW - attention-based Bi-directional LSTM
KW - deep learning
KW - video action recognition
UR - http://www.scopus.com/inward/record.url?scp=85182917206&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85182917206&partnerID=8YFLogxK
U2 - 10.1109/IIT59782.2023.10366415
DO - 10.1109/IIT59782.2023.10366415
M3 - Conference contribution
AN - SCOPUS:85182917206
T3 - 2023 15th International Conference on Innovations in Information Technology, IIT 2023
SP - 127
EP - 131
BT - 2023 15th International Conference on Innovations in Information Technology, IIT 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th International Conference on Innovations in Information Technology, IIT 2023
Y2 - 14 November 2023 through 15 November 2023
ER -