TY - JOUR
T1 - A Survey of Binary Code Fingerprinting Approaches
T2 - Taxonomy, Methodologies, and Features
AU - Alrabaee, Saed
AU - Debbabi, Mourad
AU - Wang, Lingyu
N1 - Funding Information:
The first author is partially supported by the United Arab Emirates University Start-up Grant G00003261.
Publisher Copyright:
© 2022 Association for Computing Machinery.
PY - 2022/1/17
Y1 - 2022/1/17
N2 - Binary code fingerprinting is crucial in many security applications. Examples include malware detection, software infringement, vulnerability analysis, and digital forensics. It is also useful for security researchers and reverse engineers since it enables high fidelity reasoning about the binary code such as revealing the functionality, authorship, libraries used, and vulnerabilities. Numerous studies have investigated binary code with the goal of extracting fingerprints that can illuminate the semantics of a target application. However, extracting fingerprints is a challenging task since a substantial amount of significant information will be lost during compilation, notably, variable and function naming, the original data and control flow structures, comments, semantic information, and the code layout. This article provides the first systematic review of existing binary code fingerprinting approaches and the contexts in which they are used. In addition, it discusses the applications that rely on binary code fingerprints, the information that can be captured during the fingerprinting process, and the approaches used and their implementations. It also addresses limitations and open questions related to the fingerprinting process and proposes future directions.
AB - Binary code fingerprinting is crucial in many security applications. Examples include malware detection, software infringement, vulnerability analysis, and digital forensics. It is also useful for security researchers and reverse engineers since it enables high fidelity reasoning about the binary code such as revealing the functionality, authorship, libraries used, and vulnerabilities. Numerous studies have investigated binary code with the goal of extracting fingerprints that can illuminate the semantics of a target application. However, extracting fingerprints is a challenging task since a substantial amount of significant information will be lost during compilation, notably, variable and function naming, the original data and control flow structures, comments, semantic information, and the code layout. This article provides the first systematic review of existing binary code fingerprinting approaches and the contexts in which they are used. In addition, it discusses the applications that rely on binary code fingerprints, the information that can be captured during the fingerprinting process, and the approaches used and their implementations. It also addresses limitations and open questions related to the fingerprinting process and proposes future directions.
KW - Binary code analysis
KW - reverse engineering
KW - software security
UR - http://www.scopus.com/inward/record.url?scp=85124870980&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85124870980&partnerID=8YFLogxK
U2 - 10.1145/3486860
DO - 10.1145/3486860
M3 - Article
AN - SCOPUS:85124870980
SN - 0360-0300
VL - 55
JO - ACM Computing Surveys
JF - ACM Computing Surveys
IS - 1
M1 - 19
ER -