top | item 37920218

(no title)

This statement would be technically legal on its own in x86 real mode if the compiler didn't do null pointer checks. However it would set the divide-by-zero IRQ handler to itself 0000:0000, and when the next division by zero happened, the machine run into UB (likely a reset or halt) because it would jump there, do 4x ADD byte ptr [BX + SI], AL (or ADD byte ptr [EAX], AL) followed by running the remaining interrupt vectors as instructions.

discuss

cjensen|2 years ago

Not quite. (char *) 0 is the null pointer. The null pointer is not necessarily a binary all-zero. On some compilers in x86, the null pointer intentionally points to something which will cause a crash when written to.

sacnoradhq|2 years ago

Find me one contemporary example (ANSI C) with a disassembled screenshot.

This is writing sizeof(char) (== 1 almost everywhere) zero to address zero. It is not using a NULL macro or other predefined symbol.

In the real world, this would generally write a byte to address 0000:0000, leading to UB because it would fuck up the divide-by-zero IV.

PS: I used Borland C++ 3.1, Microsoft C++ 3.x and 4.5x, Watcom, and early GNU.

jfbastien|2 years ago

Good thing this was covered in the talk

lmm|2 years ago

Having it in text is much nicer than having it in video.

layer8|2 years ago

Not quite, I think. Since this is a char pointer being used, only the first byte of the interrupt address would be zeroed. Since in real mode those are far pointers, the lower byte of the segment would be zeroed. So xx00:xxxx.

But yes, the interrupt table was my first thought when reading the headline.

kevin_thibedeau|2 years ago

Char can be the same size as short or int. You can't assume it is one byte.

saghm|2 years ago

If I'm understanding what you're saying correctly, the memory location with address 0 is actually a writable address, but with the value being used semantically to handle division by zero? It's kind of wild to me that would even something that's even allowed to be done manually, let alone required by a certain mode. Is this something provided for compatibility reasons that you'd have to opt into, or is it just something enabled by default?

lmm|2 years ago

Which part is wild? "Magic" memory addresses are a fairly normal way to communicate with hardware; nowadays there are more layers to how you set up mappings in the MMU etc., but in the old days it was normal for everything to just have a fixed address (e.g. I remember back on the Apple ][ the screen's framebuffer was in a particular memory range, or rather two - to avoid tearing you'd draw on one and then flip which one it was using). And particularly for the CPU, it's hard to see how else it could do customizable interrupt handling - I guess you could have some kind of special API with dedicated CPU instructions or something for "programming" in an interrupt table, but that would be more complex and have no particular benefit. "It reads your table of pointers from this address in memory, in this format" is pretty straightforward and easy to use.

As for why it's address 0, well, it has to go somewhere, every machine has a CPU so everyone needs an interrupt table even if they don't have much memory. And when memory was precious there was no sense wasting even one byte of it; 0 was a real address on your physical memory chip, so why not use it just like any other?

(The fact that it's "address 0" for "division by 0" is just coincidence as far as I can see; division by 0 just happens to be the first kind of possible CPU interrupt. Perhaps it was the most common one?)

wvenable|2 years ago

Back in the day there were no protections. You could write to any address whether it was used by the CPU for interrupt vectors, part of the OS, hardware addresses, anything.

layer8|2 years ago

Think of it as part of the “API” of the CPU that a program can make use of however it likes. In the early days (for DOS and the like), the distinction between operating system and application was more one of convention and not enforced by hardware mechanisms. The program was supposed to control the hardware, and not the other way around.

HansHamster|2 years ago

The interrupt vector table on x86 sits by default at 0000:0000 and the CPU uses it to handle interrupts and other exceptions by jumping to the address entry corresponding to the event. Entry 0 is division by 0, but there are also entries for illegal instruction, hardware interrupts and so on.

The address can be changed with the LIDT instruction and operating systems nowadays will just put it wherever, but for backward compatibility it is expected to still be at 0000:0000 (not sure how this is handled nowadays in UEFI, but it should still be possible t o set it up that way).

marcosdumay|2 years ago

The kernel can write almost anywhere. (Well, actually, nothing can write on most addresses in a 64 bits machine, but if it's usable for something, the kernel can use it directly.)

And yes, some addresses are special. (AFAIK, on all current mainstream architectures.) This is the expected way to set those signal handlers, output (and input) data, configure devices, etc.

That said, there are some gotchas on using specific addresses in C. AFAIK none apply to x86, but it's something you usually do in assembly.