Unfortunately, Memory64 comes with a significant performance penalty because the wasm runtime has to check bounds (which wasn't necessary on 32-bit as the runtime would simply allocate the full 4GB of address space every time).
But if you really need more than 4GB of memory, then sure, go ahead and use it.
Actually, runtimes often allocate 8GB of address space because WASM has a [base32 + index32] address mode where the effective address could overflow into the 33rd bit.
On x86-64, the start of the linear memory is typically put into one of the two remaining segment registers: GS or FS. Then the code can simply use an address mode such as "GS:[RAX + RCX]" without any additional instructions for addition or bounds-checking.
Somewhat related. At some point around 15 years ago I needed to work with large images in Java, and at least at the time the language used 32-bit integers for array sizes and indices. My image data was about 30 gigs in size, and despite having enough RAM and running a 64-bit OS and JVM I couldn't fit image data into s ingle array.
This multi-memory setup reminds me of my array juggling I had to do back then. While intellectually challenging it was not fun at all.
The problem with multi-memory (and why it hasn't seen much usage, despite having been supported in many runtimes for years) is that basically no language supports distinct memory spaces. You have to rewrite everything to use WASM intrinsics to work on a specific memory.
It looks like memories have to be declared up front, and the memcpy instruction takes the memories to copy between as numeric literals. So I guess you can't use it to allocate dynamic buffers. But maybe you could decide memory 0 = heap and memory 1 = pixel data or something like that?
I still don't understand why it's slower to mask to 33 or 34 bit rather than 32. It's all running on 64-bit in the end isn't it? What's so special about 32?
That's because with 32-bit addresses the runtime did not need to do any masking at all. It could allocate a 4GiB area of virtual memory, set up page permissions as appropriate and all memory accesses would be hardware checked without any additional work. Well that, and a special SIGSEGV/SIGBUS handler to generate a trap to the embedder.
With 64-bit addresses, and the requirements for how invalid memory accesses should work, this is no longer possible. AND-masking does not really allow for producing the necessary traps for invalid accesses. So every one now needs some conditional before to validate that this access is in-bounds. The addresses cannot be trivially offset either as they can wrap-around (and/or accidentally hit some other mapping.)
The special part is the "signal handler trick" that is easy to use for 32-bit pointers. You reserve 4GB of memory - all that 32 bits can address - and mark everything above used memory as trapping. Then you can just do normal reads and writes, and the CPU hardware checks out of bounds.
With 64-bit pointers, you can't really reserve all the possible space a pointer might refer to. So you end up doing manual bounds checks.
Because CPUs still have instructions that automatically truncate the result of all math operations to 32 bits (and sometimes 8-bit and 16-bit too, though not universally).
To operate on any other size, you need to insert extra instructions to mask addresses to the desired size before they are used.
Findecanor|5 months ago
On x86-64, the start of the linear memory is typically put into one of the two remaining segment registers: GS or FS. Then the code can simply use an address mode such as "GS:[RAX + RCX]" without any additional instructions for addition or bounds-checking.
jsheard|5 months ago
baq|5 months ago
andrewl-hn|5 months ago
This multi-memory setup reminds me of my array juggling I had to do back then. While intellectually challenging it was not fun at all.
the_duke|5 months ago
evmar|5 months ago
afiori|5 months ago
TrueDuality|5 months ago
sehugg|5 months ago
fulafel|5 months ago
zarzavat|5 months ago
nagisa|5 months ago
With 64-bit addresses, and the requirements for how invalid memory accesses should work, this is no longer possible. AND-masking does not really allow for producing the necessary traps for invalid accesses. So every one now needs some conditional before to validate that this access is in-bounds. The addresses cannot be trivially offset either as they can wrap-around (and/or accidentally hit some other mapping.)
azakai|5 months ago
With 64-bit pointers, you can't really reserve all the possible space a pointer might refer to. So you end up doing manual bounds checks.
phire|5 months ago
To operate on any other size, you need to insert extra instructions to mask addresses to the desired size before they are used.
dist1ll|5 months ago