TY - GEN
T1 - An Adaptive Black-box Defense against Trojan Attacks on Text Data
AU - Alsharadgah, Fatima
AU - Khreishah, Abdallah
AU - Al-Ayyoub, Mahmoud
AU - Jararweh, Yaser
AU - Liu, Guanxiong
AU - Issa Khalil, Issa Mohammad
AU - Almutiry, Muhannad
AU - Saeed, Nasir
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers in which adversaries try to exploit the (highly desirable) model reuse property to implant Trojans into model parameters for backdoor breaches through a poisoned training process. Most of the proposed defenses against Trojan attacks assume a white-box setup, in which the defender either has access to the inner state of NN or can run back-propagation through it. Moreover, most of exiting works that propose white-box and black-box methods to defend Trojan backdoor focus on image data. Due to the the difference in the data structure, these defenses cannot be directly applied for textual data. We propose T-TROJDEF which is a more practical but challenging black-box defense method for text data that only needs to run forward-pass of the NN model. T-TROJDEF tries to identify and filter out Trojan inputs (i.e., inputs augmented with the Trojan trigger) by monitoring the changes in the prediction confidence when the input is repeatedly perturbed. The intuition is that Trojan inputs are more stable as the misclassification only depends on the trigger, while benign inputs will suffer when perturbed due to the perturbation of the classification features.
AB - Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers in which adversaries try to exploit the (highly desirable) model reuse property to implant Trojans into model parameters for backdoor breaches through a poisoned training process. Most of the proposed defenses against Trojan attacks assume a white-box setup, in which the defender either has access to the inner state of NN or can run back-propagation through it. Moreover, most of exiting works that propose white-box and black-box methods to defend Trojan backdoor focus on image data. Due to the the difference in the data structure, these defenses cannot be directly applied for textual data. We propose T-TROJDEF which is a more practical but challenging black-box defense method for text data that only needs to run forward-pass of the NN model. T-TROJDEF tries to identify and filter out Trojan inputs (i.e., inputs augmented with the Trojan trigger) by monitoring the changes in the prediction confidence when the input is repeatedly perturbed. The intuition is that Trojan inputs are more stable as the misclassification only depends on the trigger, while benign inputs will suffer when perturbed due to the perturbation of the classification features.
KW - defense system
KW - Neural networks
KW - Trojan attack
UR - http://www.scopus.com/inward/record.url?scp=85127453445&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127453445&partnerID=8YFLogxK
U2 - 10.1109/SNAMS53716.2021.9732112
DO - 10.1109/SNAMS53716.2021.9732112
M3 - Conference contribution
AN - SCOPUS:85127453445
T3 - 2021 8th International Conference on Social Network Analysis, Management and Security, SNAMS 2021
BT - 2021 8th International Conference on Social Network Analysis, Management and Security, SNAMS 2021
A2 - Guetl, Christian
A2 - Ceravolo, Paolo
A2 - Jararweh, Yaser
A2 - Benkhelifa, Elhadj
A2 - Adedugbe, Oluwasegun
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th International Conference on Social Network Analysis, Management and Security, SNAMS 2021
Y2 - 6 December 2021 through 9 December 2021
ER -