Abstract
Binary authorship attribution refers to the process of identifying the author of a given anonymous binary file based on stylistic characteristics. It aims to automate the laborious and error-prone reverse engineering task of discovering information related to the author(s) of a binary code. Existing works typically employ machine learning methods to extract features that are unique for each author and subsequently match them against a given binary to identify the author. However, most existing works share a common critical limitation, i.e., they cannot distinguish between features representing program functionality and those representing authorship (e.g., authors' coding habits). Such distinction is crucial for effective authorship attribution because what is unique in a particular binary may be attributed to either author, compiler, or function. In this study, we present BinAuthor a system capable of decoupling program functionality from authors' coding habits in binary code. To capture coding habits, BinAuthor leverages a set of features that are based on collections of functionality-independent choices made by authors during coding. Our evaluation demonstrates that BinAuthor outperforms existing methods in several aspects. First, it successfully attributes a larger number of authors with a significantly higher accuracy (around 90 %) based on the large datasets extracted from selected open-source C + + projects in GitHub, Google Code Jam events, Planet Source Code contests, and several programming projects. Second, BinAuthor is more robust than previous methods; there is no significant drop in accuracy when the code is subjected to refactoring techniques, simple obfuscation, and processed with different compilers. Finally, decoupling authorship from functionality allows us to apply BinAuthor to real malware binaries (Citadel, Zeus, Stuxnet, Flame, Bunny, and Babar) to automatically generate evidence on similar coding habits.
Original language | English |
---|---|
Pages (from-to) | 613-648 |
Number of pages | 36 |
Journal | Journal of Computer Security |
Volume | 27 |
Issue number | 6 |
DOIs | |
Publication status | Published - 2019 |
Keywords
- Binary authorship attribution
- assembly instructions
- coding habits
- malware analysis
ASJC Scopus subject areas
- Software
- Safety, Risk, Reliability and Quality
- Hardware and Architecture
- Computer Networks and Communications