JUCS - Journal of Universal Computer Science 28(7): 758-775, doi: 10.3897/jucs.76528

Disassemble Byte Sequence Using Graph Attention Network

Jing Qiu^‡, Feng Dong^§, Guanglu Sun^§

‡ Zhejiang A&F University, Hangzhou, China§ Harbin University of Science and Technology, Harbin, China

Corresponding author: Jing Qiu ( qiujing@zafu.edu.cn )

This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY-ND 4.0). This license allows reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use.

Citation: Qiu J, Dong F, Sun G (2022) Disassemble Byte Sequence Using Graph Attention Network. JUCS - Journal of Universal Computer Science 28(7): 758-775. https://doi.org/10.3897/jucs.76528

Abstract

Disassembly is the basis of static analysis of binary code and is used in malicious code detection, vulnerability mining, software optimization, etc. Disassembly of arbitrary suspicious code blocks (e.g., for suspicious traffic packets intercepted by the network) is a difficult task. Traditional disassembly methods require manual specification of the starting address and cannot automate the disassembly of arbitrary code blocks. In this paper, we propose a disassembly method based on code extension selection network by combining traditional linear sweep and recursive traversal methods. First, each byte of a code block is used as the disassembly start address, and all disassembly results (control flow graphs) are combined into a single flow graph. Then a graph attention network is trained to pick the correct subgraph (control flow graph) as the final result. In the experiment, the compiler-generated executable file, as well as the executable file generated by hand-written assembly code, the data file and the byte sequence intercepted by the code segment were tested, and the disassembly accuracy was 93%, which can effectively distinguish the code from the data.

Keywords

Graph neural network, disassembly, function identification, reverse engineering, binary code analysis