(no title)
luiscape | 1 month ago
This is the only API available for snapshotting NVIDIA GPU memory, afaik.
As for needing to combine it with a host memory snapshot step, this is required because CUDA sessions need to be mapped to a host process, so you need to snapshot both things in order for the program to be restored correctly.
CRIU is another project that uses the same technique (CUDA snapshot + host memory snapshot). Different than CRIU, our snapshots work at the function level so we’re able to take snapshots after functions have been initialized (including GPU memory), making Modal cold boots fast. One would have to implement this entire process using CRIU.
No comments yet.