top | item 47108126

(no title)

make3 | 7 days ago

It's weird to me to train such huge models to then destroy them by using them a 3 bits quantization per presumably 16bits (bfloat16) weights. Why not just train smaller models then.

discuss

order

No comments yet.