TY - GEN
T1 - On leveraging coding habits for effective binary authorship attribution
AU - Alrabaee, Saed
AU - Shirani, Paria
AU - Wang, Lingyu
AU - Debbabi, Mourad
AU - Hanna, Aiman
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2018.
PY - 2018
Y1 - 2018
N2 - We propose BinAuthor, a novel and the first compiler-agnostic method for identifying the authors of program binaries. Having filtered out unrelated functions (compiler and library) to detect user-related functions, it converts user-related functions into a canonical form to eliminate compiler/compilation effects. Then, it leverages a set of features based on collections of authors’ choices made during coding. These features capture an author’s coding habits. Our evaluation demonstrated that BinAuthor outperforms existing methods in several respects. First, when tested on large datasets extracted from selected open-source C/C++ projects in GitHub, Google Code Jam events, and Planet Source Code contests, it successfully attributed a larger number of authors with a significantly higher accuracy: around 90 % when the number of authors is 1000. Second, when the code was subjected to refactoring techniques, code transformation, or processing using different compilers or compilation settings, there was no significant drop in accuracy, indicating that BinAuthor is more robust than previous methods.
AB - We propose BinAuthor, a novel and the first compiler-agnostic method for identifying the authors of program binaries. Having filtered out unrelated functions (compiler and library) to detect user-related functions, it converts user-related functions into a canonical form to eliminate compiler/compilation effects. Then, it leverages a set of features based on collections of authors’ choices made during coding. These features capture an author’s coding habits. Our evaluation demonstrated that BinAuthor outperforms existing methods in several respects. First, when tested on large datasets extracted from selected open-source C/C++ projects in GitHub, Google Code Jam events, and Planet Source Code contests, it successfully attributed a larger number of authors with a significantly higher accuracy: around 90 % when the number of authors is 1000. Second, when the code was subjected to refactoring techniques, code transformation, or processing using different compilers or compilation settings, there was no significant drop in accuracy, indicating that BinAuthor is more robust than previous methods.
UR - http://www.scopus.com/inward/record.url?scp=85052233173&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85052233173&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-99073-6_2
DO - 10.1007/978-3-319-99073-6_2
M3 - Conference contribution
AN - SCOPUS:85052233173
SN - 9783319990729
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 26
EP - 47
BT - Computer Security - 23rd European Symposium on Research in Computer Security, ESORICS 2018, Proceedings
A2 - Lopez, Javier
A2 - Zhou, Jianying
A2 - Soriano, Miguel
PB - Springer Verlag
T2 - 23rd European Symposium on Research in Computer Security, ESORICS 2018
Y2 - 3 September 2018 through 7 September 2018
ER -