top | item 47205772

(no title)

wizzwizz4 | 15 hours ago

That's an implementation detail. The behaviour of trained transformer models remains similar even if you quantise them to 4-bit floats, or make every floating point operation noisy. This model only works if you use double-precision floating point.

discuss

No comments yet.