top | item 47108126 (no title) make3 | 7 days ago It's weird to me to train such huge models to then destroy them by using them a 3 bits quantization per presumably 16bits (bfloat16) weights. Why not just train smaller models then. discuss order hn newest No comments yet.
No comments yet.