Skip to content
D23E Research
EVM Decompiler
Turning opaque bytecode into readable Solidity with an LLM-guided pipeline.

Decompilation & Security Tooling

Decompiling Smart Contracts with a Large Language Model

Security ToolingBy Kaihua Qin

Most contracts on Ethereum are not verified with source code on public explorers. That makes security review harder than it should be: auditors, incident responders, and researchers are often forced to reason over bytecode.

This work introduces an LLM-guided decompilation pipeline that aims to generate Solidity that is both readable and semantically faithful. The key idea is to avoid asking a model to jump directly from bytecode to high-level code. Instead, we first recover a structured intermediate representation using static program analysis—and then use that structure to guide generation.

  • Step 1: Bytecode → TAC. We lift EVM bytecode into a structured three-address code (TAC) representation via static program analysis, capturing data flow and control flow in a form that’s easier to translate.
  • Step 2: TAC → Solidity with a fine-tuned LLM. A Llama-3.2-3B model is fine-tuned on 238,446 TAC-to-Solidity function pairs to recover meaningful variable names, function signatures, and readable control structures.
  • Better output for real security work. In the evaluation reported in the paper, the approach achieves an average semantic similarity of 0.82 to original source and markedly improves readability compared to traditional decompilers.

Why this matters

Readable decompilation helps in several common workflows:

  • Triage and incident response: quickly understand what an unverified contract does when something goes wrong.
  • Security reviews: recover function boundaries, control flow, and storage interactions that are painful to infer from bytecode alone.
  • Malware analysis: analyze malicious or obfuscated contracts at scale.

Want to try the production system? Visit evmdecompiler.com. For custom reverse engineering or audit support, email [email protected].

Demo

Note: decompilation is an approximation. Treat generated code as an aid for understanding and auditing—not a substitute for verified source.