Maybe we then need an LLM to tell us if two pieces of compiled code are equivale...

saagarjha · on March 18, 2024

An LLM cannot do this. I don’t even mean this in a formal sense, because your problem is addressed by Rice’s Theorem, which places bounds on what any system (LLM or not) can do here; I mean it in the sense that an LLM isn’t even appropriate to use here because the best it can possibly do is provide you with its best guess at the answer. And while this might be a useful property for decompilation in general that’s not what was being discussed here.

dheera · on March 18, 2024

Rice's theorem does NOT prevent a program from giving correct answers to non-trivial properties of programs (including the halting problem or other undecidable problems) for 99.99% of inputs and "I don't know" for 0.01% of inputs. It only states that you cannot write a program that provides a correct and definitive yes-or-no for 100% of inputs.

For a decompiler, being able to decompile even 90% of programs would be awesome. We're not looking for theoretical perfectness.

thfuran · on March 17, 2024

Why would an llm be the tool for that job?

dheera · on March 17, 2024

Without analytical thinking how else would you come to conviction that two functions are identical, for a computationally unfeasible number of possible inputs?

kolinko · on March 18, 2024

Formal logic / formal proofs. We have good systems for verifying that.

The proper flow is that you use LLM to generate decompilation steps, along with potential proofs, and then use old algorithms from 1970s that verify that the steps are correct.

Source: I built a decompiler for EVM, arguably the best one on the market, and to some extent it was how it worked (and others comparable in class).

The issue was always the exploration of possible transformations of code, once you manage to find the right ones (which LLMs can propose way better than old hard coded rules and SMT solvers), it's simple to verify that the transformations are correct.