Binary Level Toolchain Provenance Identification with Graph Neural Network

nananana·2022년 9월 19일
0

Aim:

Devise a Machine Learning (ML) based solution to the toolchain provenance identification problem over stripped binary codes.

Background:

Toolchain Provenance: Compiler Family, Compiler Version, Optimization Level

Importance Seem Questionable.
1. Determines Security Flaws
2. Helpful in identifying functions.

Usually security flaws are introduced by the programmer.
Difficult to realize how toolchain can help with identifying function.

Site Neural Network (SNN)

  • GNN based framework used to determine compiler toolchain
  • Use hierarchy of SNN's making binary decisions.

Site

  • A subgraph of CFG after fCFG is chopped in the chopping phase

Previous Work

  • Rosenblum et al. using Support Vector Machine
  • Recent works rely on Neural Networks (CNN, RNN)
  • Massarelli et al. extract binary Control Flow Graph (CFG) and process each block with Natural Language Processing (NLP) techniques.

Difference

Paper centers in Program Level Binary instead of Function level binary.

Utilize forgetting CFG (basically simplified CFG)

Utilize a Graph Neural Network (GNN) based solution: Site Neural Network (SNN)

Method

Setup

  • 23 different compiler version (Clang, GCC, MinGW, Visual Studio)
  • 4 class of optimization
  • 92 Different Compiler configuration
  • 36,272 C/C++ Source Code solving 91 problems in CodeForces
  • Clang (3.9.1, 4.0.1, 5.0.1, 6.0.0, 7.0.0, 8.0)
  • GCC (4.8.5, 5.5.0, 6.5.0, 7.5.0, 8.4.0, 9.3.0)
  • MinGW (3.4.5, 4.4.1, 4.7.1, 4.9.2, 5.11.0, 8.1.1)
  • Visual Studio (10.0, 12.0, 14.0, 2017, 2019)

Method

  • Binary Code Preprocessing: binary => CFG
  • Forgetful Phase: CFG => forgetful CFG
  • Chopping Phase: fCFG => Set of sites
  • Set of sites => Train and test model. SNN
  • Multiple SNN (local expert) in hierarchy

Result

RQ1. How does our framework evolve when the site size ɑ increase in terms of running time performance?

Increasing alpha, since volume of data is increased, time per element also increases.

RQ2. How does our framework evolve when the site size ɑ increase in terms of accuracy?

Accuracy does not necessarily increase as alpha is increased.

RQ3. Does our framework have the capacity to predict the compiler and optimization level of binary codes?

Accuracy in predicting family: Macro Avg F1 Score = 0.9950

Accuracy in predicting Optimization Level Prediction: Macro Avg F1 Score = 0.7549

RQ4. Does our framework have the capacity to predict the compiler version of binary codes?

Accuracy in predicting compiler version: Macro Avg F1 Score = 0.6475

Accuracy (excluding Clang) in predicting compiler version: Macro Avg F1 Score = 0.8167

Limitation

Dataset composed of small programs

Implementation

0개의 댓글