(no title)
polomi | 3 years ago
> Although this may seem like a significant drawback, it's actually not that alarming, at least not to me. While it's true that you must insert various "move mask to v0" instructions that wouldn't otherwise be there, it's important to remember that these will not really be actual computation instructions. Moves from one vector register to another will always be simple register renames handled by the front end of any high-performance chip, and I would consider it highly unlikely that you would change masks so frequently as to overburden the front end.
This is not the point the author was making.
maxwell86|3 years ago
The hardware can back up the 1024-bit register with a pool of 32-bit registers, and if you only wrote to the first 32-bits, and all others are zero, it can use a single 32-bit register to back it up, making this "as good" as the single mask register solution, which the author thinks is good.
ncmncm|3 years ago