(no title)
pqn
|
3 years ago
Let's just take the topic of measuring GPU usage. This alone is quite tricky -- tools like nvidia-smi will show full GPU utilization even if not all SMs are running. And also the workload may change behavior over time, if for instance inputs to transformers got longer over time. And then it gets even more complicated to measure when considering optimizations like dynamic batching. I think if you peek into some ML Ops communities you can get a flavor of these nuances, but not sure if there are good exhaustive guides around right now.
No comments yet.