TY - JOUR
T1 - Crack45K
T2 - Integration of Vision Transformer with Tubularity Flow Field (TuFF) and Sliding-Window Approach for Crack-Segmentation in Pavement Structures
AU - Ali, Luqman
AU - Jassmi, Hamad Al
AU - Khan, Wasif
AU - Alnajjar, Fady
N1 - Funding Information:
The authors are very grateful to the support of China Scholarship Council (201707790004). The authors also would like to acknowledge support from Science and Research Programs of Education Department of Jilin Provence of P.R.C. (No. 201684).
Publisher Copyright:
© 2022 by the authors.
PY - 2023/1
Y1 - 2023/1
N2 - Recently, deep-learning (DL)-based crack-detection systems have proven to be the method of choice for image processing-based inspection systems. However, human-like generalization remains challenging, owing to a wide variety of factors such as crack type and size. Additionally, because of their localized receptive fields, CNNs have a high false-detection rate and perform poorly when attempting to capture the relevant areas of an image. This study aims to propose a vision-transformer-based crack-detection framework that treats image data as a succession of small patches, to retrieve global contextual information (GCI) through self-attention (SA) methods, and which addresses the CNNs’ problem of inductive biases, including the locally constrained receptive-fields and translation-invariance. The vision-transformer (ViT) classifier was tested to enhance crack classification, localization, and segmentation performance by blending with a sliding-window and tubularity-flow-field (TuFF) algorithm. Firstly, the ViT framework was trained on a custom dataset consisting of 45K images with 224 × 224 pixels resolution, and achieved accuracy, precision, recall, and F1 scores of 0.960, 0.971, 0.950, and 0.960, respectively. Secondly, the trained ViT was integrated with the sliding-window (SW) approach, to obtain a crack-localization map from large images. The SW-based ViT classifier was then merged with the TuFF algorithm, to acquire efficient crack-mapping by suppressing the unwanted regions in the last step. The robustness and adaptability of the proposed integrated-architecture were tested on new data acquired under different conditions and which were not utilized during the training and validation of the model. The proposed ViT-architecture performance was evaluated and compared with that of various state-of-the-art (SOTA) deep-learning approaches. The experimental results show that ViT equipped with a sliding-window and the TuFF algorithm can enhance real-world crack classification, localization, and segmentation performance.
AB - Recently, deep-learning (DL)-based crack-detection systems have proven to be the method of choice for image processing-based inspection systems. However, human-like generalization remains challenging, owing to a wide variety of factors such as crack type and size. Additionally, because of their localized receptive fields, CNNs have a high false-detection rate and perform poorly when attempting to capture the relevant areas of an image. This study aims to propose a vision-transformer-based crack-detection framework that treats image data as a succession of small patches, to retrieve global contextual information (GCI) through self-attention (SA) methods, and which addresses the CNNs’ problem of inductive biases, including the locally constrained receptive-fields and translation-invariance. The vision-transformer (ViT) classifier was tested to enhance crack classification, localization, and segmentation performance by blending with a sliding-window and tubularity-flow-field (TuFF) algorithm. Firstly, the ViT framework was trained on a custom dataset consisting of 45K images with 224 × 224 pixels resolution, and achieved accuracy, precision, recall, and F1 scores of 0.960, 0.971, 0.950, and 0.960, respectively. Secondly, the trained ViT was integrated with the sliding-window (SW) approach, to obtain a crack-localization map from large images. The SW-based ViT classifier was then merged with the TuFF algorithm, to acquire efficient crack-mapping by suppressing the unwanted regions in the last step. The robustness and adaptability of the proposed integrated-architecture were tested on new data acquired under different conditions and which were not utilized during the training and validation of the model. The proposed ViT-architecture performance was evaluated and compared with that of various state-of-the-art (SOTA) deep-learning approaches. The experimental results show that ViT equipped with a sliding-window and the TuFF algorithm can enhance real-world crack classification, localization, and segmentation performance.
KW - ViT transformer
KW - crack-detection
KW - deep learning
KW - machine learning
KW - pavement cracks
KW - structural-health monitoring
KW - vision-based inspection
UR - http://www.scopus.com/inward/record.url?scp=85146528934&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146528934&partnerID=8YFLogxK
U2 - 10.3390/buildings13010055
DO - 10.3390/buildings13010055
M3 - Article
AN - SCOPUS:85146528934
SN - 2075-5309
VL - 13
JO - Buildings
JF - Buildings
IS - 1
M1 - 55
ER -