TY - GEN
T1 - Attention-based Network for Image/Video Salient Object Detection
AU - Elharrouss, Omar
AU - Elkaitouni, Soukaina El Idrissi
AU - Akbari, Younes
AU - Al-Maadeed, Somaya
AU - Bouridane, Ahmed
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The goal of video or image salient object detection is to identify the most important object in the scene, which can be helpful in many computer vision-based tasks. As the human vision framework has a successful capacity to effortlessly perceive locales of interest from complex scenes, salient object detection mimics a similar concept. However, the salient object detection (SOD) of complex video scenes is a challenging task. This paper mainly focuses on learning from channel and Spatiotemporal representations for image/video salient object detection. The proposed method consists of three levels, the frontend, the attention models, and the backend. While the frontend consists of VGG backbone which ultimately learns the representation of the common and the discrimination features. After that, both Attention, Channel-wise, and Spatiotemporal models are applied to highlight the significant object using a feature detector and to calculate the spatial attention. Then the output features are fused to obtain the final saliency result. Experimental investigation evaluations confirm that our proposed model has proved its validity and effectiveness compared with the state-of-the-art methods.
AB - The goal of video or image salient object detection is to identify the most important object in the scene, which can be helpful in many computer vision-based tasks. As the human vision framework has a successful capacity to effortlessly perceive locales of interest from complex scenes, salient object detection mimics a similar concept. However, the salient object detection (SOD) of complex video scenes is a challenging task. This paper mainly focuses on learning from channel and Spatiotemporal representations for image/video salient object detection. The proposed method consists of three levels, the frontend, the attention models, and the backend. While the frontend consists of VGG backbone which ultimately learns the representation of the common and the discrimination features. After that, both Attention, Channel-wise, and Spatiotemporal models are applied to highlight the significant object using a feature detector and to calculate the spatial attention. Then the output features are fused to obtain the final saliency result. Experimental investigation evaluations confirm that our proposed model has proved its validity and effectiveness compared with the state-of-the-art methods.
KW - Image Salient Object Detection
KW - Image segmentation
KW - semantic segmentation
KW - Video Salient Object Detection
UR - http://www.scopus.com/inward/record.url?scp=85179513733&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85179513733&partnerID=8YFLogxK
U2 - 10.1109/EUVIP58404.2023.10323073
DO - 10.1109/EUVIP58404.2023.10323073
M3 - Conference contribution
AN - SCOPUS:85179513733
T3 - Proceedings - European Workshop on Visual Information Processing, EUVIP
BT - 2023 11th European Workshop on Visual Information Processing, EUVIP 2023 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 11th European Workshop on Visual Information Processing, EUVIP 2023
Y2 - 11 September 2023 through 14 September 2023
ER -