top | item 35400066

(no title)

It appears that this was just a misreading of how memory usage was being reported and there was actually no improvement here. At least nothing so sensational as being able to run a larger-than-RAM model without swapping from disk on every iteration.

discuss

jart|2 years ago

Please read the original link to the pull request, where I stated my change offered a 2x improvement in memory usage. You actually are able to load models 2x larger without compromising system stability, because pages are no longer being copied. That's because you previously needed 40gb of RAM to load a 20GB model, in order to ensure your file cache wasn't destroyed and need to reread from disk the next time. Now you only need 20GB to load a 20GB model.

The peculiarity here is that tools like htop were reporting the improvement as being an 8x improvement, which is interesting, because RAM use is only 2x better due to my change. The rusage.com page fault reporting was also interesting too. This is not due to sparseness. It's because htop was subtracting MAP_SHARED memory. The htop docs say on my computer that the color purple is used to display shared memory, and yellow is used to display kernel file caches. But it turned out it just uses yellow for both, even though it shouldn't, because mincore() reported that the shared memory had been loaded into the resident set size.

shock-value|2 years ago

It's obviously a productive change and kudos for taking it on, but much of the enthusiasm being generated here was driven by the entirely unanticipated prospect of running a model at full speed using less memory than the model's own footprint, and by the notion that inference with a dense model somehow behaved in a sparse manner at runtime. Best to be a bit more grounded here, particularly with regard to claims that defy common understanding.