top | item 41958022

(no title)

jtotheh | 1 year ago

This may be a really dumb question, but is that much of the behavior of an x86_64 CPU variable and undefined? Until recently I thought the chipmakers provided full information (recently I found an article about people investigating the undocumented innards of the 286, IIRC). This seems like a pretty shaky foundation for software.

discuss

order

jxors|1 year ago

Not a dumb question at all!

Documentation is definitely not one of x86's strengths. Other architectures do much better. For example, ARM provides formal models of their CPUs, and RISC-V is so simple you could implement all its semantics in a few thousand lines of code.

There are quite a few instructions with undefined behavior, but it is not that much of an issue if you can choose to avoid it -- for example in a compiler. Almost all UB is found in flags or when using invalid instruction prefixes. And although there is some unexpected UB, like `imul`'s zero flag being UB instead of being set according to the result of the multiplication [1], reading the manual and sticking to the parts that are clearly not UB gets you most of the way.

However, it becomes an issue if you need to analyze a binary that uses UB. Then you can't choose which instructions to use, so you need to have a complete model of all UB. That's much more difficult, and for example most decompilers currently fail at this. We have an example of this in Figure 1 of our paper.

[1]: https://explore.liblisa.nl/instruction/F7E8