๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

[ACM Press the 2011 International Symposium - Toronto, Ontario, Canada (2011.07.17-2011.07.21)] Proceedings of the 2011 International Symposium on Software Testing and Analysis - ISSTA '11 - Recovering the toolchain provenance of binary code

โœ Scribed by Rosenblum, Nathan; Miller, Barton P.; Zhu, Xiaojin


Book ID
120986817
Publisher
ACM Press
Year
2011
Weight
420 KB
Category
Article
ISBN
1450305628

No coin nor oath required. For personal study only.

โœฆ Synopsis


Program binaries are an artifact of a production process that begins with source code and ends with a string of bytes representing executable code. There are many reasons to want to know the specifics of this process for a given binary-for forensic investigation of malware, to diagnose the role of the compiler in crashes or performance problems, or for reverse engineering and decompilation-but binaries are not generally annotated with such provenance details. Intuitively, the binary code should exhibit properties specific to the process that produced it, but it is not at all clear how to find such properties and map them to specific elements of that process.In this paper, we present an automatic technique to recover toolchain provenance: those details, such as the source language and the compiler and compilation options, that define the transformation process through which the binary was produced. We approach provenance recovery as a classification problem, discovering characteristics of binary code that are strongly associated with particular toolchain components and developing models that can infer the likely provenance of program binaries. Our experiments show that toolchain provenance can be recovered with high accuracy, approaching 100% accuracy for some components and yielding good results (90%) even when the binaries emitted by different components appear to be very similar.


๐Ÿ“œ SIMILAR VOLUMES