top | item 47205772 (no title) wizzwizz4 | 15 hours ago That's an implementation detail. The behaviour of trained transformer models remains similar even if you quantise them to 4-bit floats, or make every floating point operation noisy. This model only works if you use double-precision floating point. discuss order hn newest No comments yet.
No comments yet.