TY - CONF
T1 - OBA2
T2 - 2014 Digital Forensic Research Conference, DFRWS 2014 EU
AU - Alrabaee, Saed
AU - Saleem, Noman
AU - Preda, Stere
AU - Wang, Lingyu
AU - Debbabi, Mourad
N1 - Funding Information:
The authors thank the anonymous reviewers for their valuable comments. The authors are thankful to Philippe Charland from Defence Research and Development Canada for early discussions on this topic. Initial versions of this work have been funded through a research contract with DRDC Canada. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsoring organizations.
Publisher Copyright:
© 2014 The Author. Published by Elsevier Ltd on behalf of DFRWS.
PY - 2014
Y1 - 2014
N2 - A critical aspect of malware forensics is authorship analysis. The successful outcome of such analysis is usually determined by the reverse engineer’s skills and by the volume and complexity of the code under analysis. To assist reverse engineers in such a tedious and error-prone task, it is desirable to develop reliable and automated tools for supporting the practice of malware authorship attribution. In a recent work, machine learning was used to rank and select syntax-based features such as n-grams and flow graphs. The experimental results showed that the top ranked features were unique for each author, which was regarded as an evidence that those features capture the author’s programming styles. In this paper, however, we show that the uniqueness of features does not necessarily correspond to authorship. Specifically, our analysis demonstrates that many “unique” features selected using this method are clearly unrelated to the authors’ programming styles, for example, unique IDs or random but unique function names generated by the compiler; furthermore, the overall accuracy is generally unsatisfactory. Motivated by this discovery, we propose a layered Onion Approach for Binary Authorship Attribution called OBA2. The novelty of our approach lies in the three complementary layers: preprocessing, syntax-based attribution, and semantic-based attribution. Experiments show that our method produces results that not only are more accurate but have a meaningful connection to the authors’ styles.
AB - A critical aspect of malware forensics is authorship analysis. The successful outcome of such analysis is usually determined by the reverse engineer’s skills and by the volume and complexity of the code under analysis. To assist reverse engineers in such a tedious and error-prone task, it is desirable to develop reliable and automated tools for supporting the practice of malware authorship attribution. In a recent work, machine learning was used to rank and select syntax-based features such as n-grams and flow graphs. The experimental results showed that the top ranked features were unique for each author, which was regarded as an evidence that those features capture the author’s programming styles. In this paper, however, we show that the uniqueness of features does not necessarily correspond to authorship. Specifically, our analysis demonstrates that many “unique” features selected using this method are clearly unrelated to the authors’ programming styles, for example, unique IDs or random but unique function names generated by the compiler; furthermore, the overall accuracy is generally unsatisfactory. Motivated by this discovery, we propose a layered Onion Approach for Binary Authorship Attribution called OBA2. The novelty of our approach lies in the three complementary layers: preprocessing, syntax-based attribution, and semantic-based attribution. Experiments show that our method produces results that not only are more accurate but have a meaningful connection to the authors’ styles.
KW - Authorship attribution
KW - Binary program analysis
KW - Digital forensics
KW - Malware forensics
KW - Reverse engineering
UR - http://www.scopus.com/inward/record.url?scp=85068725709&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068725709&partnerID=8YFLogxK
U2 - 10.1016/j.diin.2014.03.012
DO - 10.1016/j.diin.2014.03.012
M3 - Paper
AN - SCOPUS:85068725709
SP - S94-S103
Y2 - 7 May 2014 through 9 May 2014
ER -