Improved Call Graph Comparison Using Simulated Annealing by Orestis Kostakis, Joris Kinable, Hamed Mahmoudi, Kimmo Mustonen.

Abstract:

The amount of suspicious binary executables submitted to Anti-Virus (AV) companies are in the order of tens of thousands per day. Current hash-based signature methods are easy to deceive and are inefficient for identifying known malware that have undergone minor changes. Examining malware executables using their call graphs view is a suitable approach for overcoming the weaknesses of hash-based signatures. Unfortunately, many operations on graphs are of high computational complexity. One of these is the Graph Edit Distance (GED) between pairs of graphs, which seems a natural choice for static comparison of malware. We demonstrate how Simulated Annealing can be used to approximate the graph edit distance of call graphs, while outperforming previous approaches both in execution time and solution quality. Additionally, we experiment with opcode mnemonic vectors to reduce the problem size and examine how Simulated Annealing is affected.

From the introduction:

To facilitate the recognition of highly similar executables or commonalities among multiple executables which have been subject to modification, a high-level structure, i.e. an abstraction, of the samples is required. One such abstraction is the call graph which is a graphical representation of a binary executable, where functions are modelled as vertices and calls between those functions as directed edges. Minor changes in the body of the code are not reflected in the structure of the graph.

Can you say *subject identity*? ðŸ˜‰

How you judge subject identity depends on the circumstances and requirements of any given situation.

Very recent and I suspect important work on the detection of malware.