5 days ago|discuss
user: vkaufmann
22 karma | created 19 days ago
recent submissions
Custom FP4 CUDA Kernel – 129 Tflops on DGX Spark with Pre-Quantized Weight Cache
(forums.developer.nvidia.com)
2 pts|5 days ago|1 comment
10 days ago|discuss
4 pts|10 days ago|1 comment
19 days ago|discuss
19 days ago|discuss
19 days ago|discuss
19 days ago|discuss
19 days ago|discuss
19 days ago|discuss
19 days ago|discuss
43 pts|19 days ago|31 comments
3 pts|19 days ago|discuss