Drone-HAT: Hybrid Attention Transformer for Complex Action Recognition in Drone Surveillance Videos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Citations (Scopus)

Abstract

Ultra-high-resolution aerial videos are becoming increasingly popular for enhancing surveillance capabilities in sparsely populated areas. However, analyzing human activities automatically, such as "who is doing what?"in these videos, is desirable to realize their surveillance potential. In contrast, atomic visual action detection has successfully recognized such activities in movie data. However, adapting it to ultra-high resolution aerial videos is challenging because the target persons appear relatively tiny from overhead views and are sparsely located. Additionally, existing atomic visual action detection methods are based on single-label actions. However, people can perform multiple actions simultaneously, so a multi-label approach would be more appropriate. To address these problems, we propose a multi-label action detection/recognition framework using a hybrid attention vision transformer (HAT) to recognize recurrent actions more efficiently. Additionally, a multi-scale, multi-granularity module inside the action recognition transformer block extracts relevant features without redundancy. Using the Okutama Dataset, we demonstrated that our method performs better than existing state-of-the-art methodologies for interpreting aerial videos for human activity.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
PublisherIEEE Computer Society
Pages4713-4722
Number of pages10
ISBN (Electronic)9798350365474
DOIs
Publication statusPublished - 2024
Externally publishedYes
Event2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024 - Seattle, United States
Duration: Jun 16 2024Jun 22 2024

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
ISSN (Print)2160-7508
ISSN (Electronic)2160-7516

Conference

Conference2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
Country/TerritoryUnited States
CitySeattle
Period6/16/246/22/24

Keywords

  • Aerial Surveillance
  • Hybrid Attention Transformer
  • Multi-granularity and Multi-scale Fusion
  • Multi-label Action Recognition

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Drone-HAT: Hybrid Attention Transformer for Complex Action Recognition in Drone Surveillance Videos'. Together they form a unique fingerprint.

Cite this