top | item 45280649

(no title)

renehsz | 5 months ago

Unfortunately, Memory64 comes with a significant performance penalty because the wasm runtime has to check bounds (which wasn't necessary on 32-bit as the runtime would simply allocate the full 4GB of address space every time).

But if you really need more than 4GB of memory, then sure, go ahead and use it.

discuss

order

Findecanor|5 months ago

Actually, runtimes often allocate 8GB of address space because WASM has a [base32 + index32] address mode where the effective address could overflow into the 33rd bit.

On x86-64, the start of the linear memory is typically put into one of the two remaining segment registers: GS or FS. Then the code can simply use an address mode such as "GS:[RAX + RCX]" without any additional instructions for addition or bounds-checking.

jsheard|5 months ago

The comedy option would be to use the new multi-memory feature to juggle a bunch of 32bit memories instead of a 64bit one, at the cost of your sanity.

baq|5 months ago

didn't we call it 'segmented memory' back in DOS days...?

andrewl-hn|5 months ago

Somewhat related. At some point around 15 years ago I needed to work with large images in Java, and at least at the time the language used 32-bit integers for array sizes and indices. My image data was about 30 gigs in size, and despite having enough RAM and running a 64-bit OS and JVM I couldn't fit image data into s ingle array.

This multi-memory setup reminds me of my array juggling I had to do back then. While intellectually challenging it was not fun at all.

the_duke|5 months ago

The problem with multi-memory (and why it hasn't seen much usage, despite having been supported in many runtimes for years) is that basically no language supports distinct memory spaces. You have to rewrite everything to use WASM intrinsics to work on a specific memory.

evmar|5 months ago

It looks like memories have to be declared up front, and the memcpy instruction takes the memories to copy between as numeric literals. So I guess you can't use it to allocate dynamic buffers. But maybe you could decide memory 0 = heap and memory 1 = pixel data or something like that?

afiori|5 months ago

Honestly you could allocate a new memory for every page :-)

TrueDuality|5 months ago

The irony for me is that it's already slow because of the lack of native 64-bit math. I don't care about the memory space available nearly as much.

sehugg|5 months ago

Eh? I'm pretty sure it's had 64-bit math for awhile -- i64.add, etc.

fulafel|5 months ago

Bounds checking in other PLT is often reproted to result in pretty low overheads. Will be interesting to see some details about how this turns out.

zarzavat|5 months ago

I still don't understand why it's slower to mask to 33 or 34 bit rather than 32. It's all running on 64-bit in the end isn't it? What's so special about 32?

nagisa|5 months ago

That's because with 32-bit addresses the runtime did not need to do any masking at all. It could allocate a 4GiB area of virtual memory, set up page permissions as appropriate and all memory accesses would be hardware checked without any additional work. Well that, and a special SIGSEGV/SIGBUS handler to generate a trap to the embedder.

With 64-bit addresses, and the requirements for how invalid memory accesses should work, this is no longer possible. AND-masking does not really allow for producing the necessary traps for invalid accesses. So every one now needs some conditional before to validate that this access is in-bounds. The addresses cannot be trivially offset either as they can wrap-around (and/or accidentally hit some other mapping.)

azakai|5 months ago

The special part is the "signal handler trick" that is easy to use for 32-bit pointers. You reserve 4GB of memory - all that 32 bits can address - and mark everything above used memory as trapping. Then you can just do normal reads and writes, and the CPU hardware checks out of bounds.

With 64-bit pointers, you can't really reserve all the possible space a pointer might refer to. So you end up doing manual bounds checks.

phire|5 months ago

Because CPUs still have instructions that automatically truncate the result of all math operations to 32 bits (and sometimes 8-bit and 16-bit too, though not universally).

To operate on any other size, you need to insert extra instructions to mask addresses to the desired size before they are used.

dist1ll|5 months ago

WASM traps on out-of-bounds accesses (including overflow). Masking addresses would hide that.