TY - GEN
T1 - BinEye
T2 - 24th European Symposium on Research in Computer Security, ESORICS 2019
AU - Alrabaee, Saed
AU - Karbab, El Mouatez Billah
AU - Wang, Lingyu
AU - Debbabi, Mourad
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - In this paper, we present BinEye, an innovative tool which trains a system of three convolutional neural networks to characterize the authors of program binaries based on novel sets of features. The first set of features is obtained by converting an executable binary code into a gray image; the second by transforming each executable into a series of bytecode; and the third by representing each function in terms of its opcodes. By leveraging advances in deep learning, we are then able to characterize a large set of authors. This is accomplished even without the missing features and despite the complications arising from compilation. In fact, BinEye does not require any prior knowledge of the target binary. More important, an analysis of the model provides a satisfying explanation of the results obtained: BinEye is able to auto-learn each author’s coding style and thus characterize the authors of program binaries. We evaluated BinEye on large datasets extracted from selected open-source C++ projects in GitHub, Google Code Jam events, and several programming projects, comparing it wiexperimental results demonstrate that BinEye characterizes a larger number of authors with a significantly higher accuracy (above 90%). We also employed it in the context of several case studies. When applied to Zeus and Citadel, BinEye found that this pair might be associated with common authors. For other packages, BinEye demonstrated its ability to identify the presence of multiple authors in binary code.
AB - In this paper, we present BinEye, an innovative tool which trains a system of three convolutional neural networks to characterize the authors of program binaries based on novel sets of features. The first set of features is obtained by converting an executable binary code into a gray image; the second by transforming each executable into a series of bytecode; and the third by representing each function in terms of its opcodes. By leveraging advances in deep learning, we are then able to characterize a large set of authors. This is accomplished even without the missing features and despite the complications arising from compilation. In fact, BinEye does not require any prior knowledge of the target binary. More important, an analysis of the model provides a satisfying explanation of the results obtained: BinEye is able to auto-learn each author’s coding style and thus characterize the authors of program binaries. We evaluated BinEye on large datasets extracted from selected open-source C++ projects in GitHub, Google Code Jam events, and several programming projects, comparing it wiexperimental results demonstrate that BinEye characterizes a larger number of authors with a significantly higher accuracy (above 90%). We also employed it in the context of several case studies. When applied to Zeus and Citadel, BinEye found that this pair might be associated with common authors. For other packages, BinEye demonstrated its ability to identify the presence of multiple authors in binary code.
UR - http://www.scopus.com/inward/record.url?scp=85075606227&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075606227&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-29962-0_3
DO - 10.1007/978-3-030-29962-0_3
M3 - Conference contribution
AN - SCOPUS:85075606227
SN - 9783030299613
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 47
EP - 67
BT - Computer Security – ESORICS 2019 - 24th European Symposium on Research in Computer Security, Proceedings
A2 - Sako, Kazue
A2 - Schneider, Steve
A2 - Ryan, Peter Y.A.
PB - Springer
Y2 - 23 September 2019 through 27 September 2019
ER -