TY - JOUR
T1 - A stratified approach to function fingerprinting in program binaries using diverse features
AU - Alrabaee, Saed
N1 - Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2022/5/1
Y1 - 2022/5/1
N2 - Fingerprinting individual functions in binary code is useful in many security applications ranging from digital forensic analysis of malware corpora to the detection of critical security vulnerabilities. However, existing approaches for fingerprinting functions are typically not resilient to code transformation methods or the use of different compilers. Moreover, another common weakness with these approaches is that when they report a similarity, they do not provide reverse engineers with any insight into the underlying evidence. In order to bridge this gap, our paper presents PLUMERIA, an obfuscation-resilient and scalable approach based on a stratified architecture comprised of three layers. The first layer retrieves as many candidates as possible by capturing statistical characteristics, function behavior, and function neighborhood relationships. The second layer then trains a linear conditional random field to learn the correlations between the features of the function and its semantics. This layer is designed to reduce the number of false positives. Finally, the third layer is designed to provide insights into the underlying evidence by collecting the side effects exhibited from the candidates selected by the previous layer. Our study evaluates PLUMERIA in the context of several scenarios: fingerprinting functions in obfuscated/de-obfuscated binaries; fingerprinting functions across different compilers; fingerprinting various vulnerabilities across compilers and versions; and fingerprinting standard library functions. We then benchmark PLUMERIA on real-world projects and malware binaries, comparing it with existing state-of-the-art solutions.
AB - Fingerprinting individual functions in binary code is useful in many security applications ranging from digital forensic analysis of malware corpora to the detection of critical security vulnerabilities. However, existing approaches for fingerprinting functions are typically not resilient to code transformation methods or the use of different compilers. Moreover, another common weakness with these approaches is that when they report a similarity, they do not provide reverse engineers with any insight into the underlying evidence. In order to bridge this gap, our paper presents PLUMERIA, an obfuscation-resilient and scalable approach based on a stratified architecture comprised of three layers. The first layer retrieves as many candidates as possible by capturing statistical characteristics, function behavior, and function neighborhood relationships. The second layer then trains a linear conditional random field to learn the correlations between the features of the function and its semantics. This layer is designed to reduce the number of false positives. Finally, the third layer is designed to provide insights into the underlying evidence by collecting the side effects exhibited from the candidates selected by the previous layer. Our study evaluates PLUMERIA in the context of several scenarios: fingerprinting functions in obfuscated/de-obfuscated binaries; fingerprinting functions across different compilers; fingerprinting various vulnerabilities across compilers and versions; and fingerprinting standard library functions. We then benchmark PLUMERIA on real-world projects and malware binaries, comparing it with existing state-of-the-art solutions.
KW - Binary code
KW - Machine learning
KW - Reverse engineering
UR - http://www.scopus.com/inward/record.url?scp=85123267415&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123267415&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2021.116384
DO - 10.1016/j.eswa.2021.116384
M3 - Article
AN - SCOPUS:85123267415
SN - 0957-4174
VL - 193
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 116384
ER -