Properly measuring "GPU load" is something I've been wondering about, as an architect who's had to deploy ML/DL models but is still relatively new at it. With CPU workloads you can generally tell from %CPU, %Mem and IOs how much load your system is under. But with GPU I'm not sure how you can tell, other than by just measuring your model execution times. I find it makes it hard to get an idea whether upgrading to a stronger GPU would help and by how much. Are there established ways of doing this?
sailingparrot|4 months ago
But for model-wide performance, you basically have to come up with your own calculation to estimate the FLOPs required by your model and based on that figure out how well your model is maxing out the GPU capabilities (MFU/HFU).
Here is a more in-depth example on how you might do this: https://github.com/stas00/ml-engineering/tree/master/trainin...
hatthew|4 months ago
jplusequalt|4 months ago
For more information: https://docs.nvidia.com/cuda/cuda-c-programming-guide/#multi...
villgax|4 months ago