top | item 47202794 (no title) dnautics | 22 hours ago symbol manipulation in transformers is fp arithmetic? discuss order hn newest wizzwizz4|12 hours ago That's an implementation detail. The behaviour of trained transformer models remains similar even if you quantise them to 4-bit floats, or make every floating point operation noisy. This model only works if you use double-precision floating point.
wizzwizz4|12 hours ago That's an implementation detail. The behaviour of trained transformer models remains similar even if you quantise them to 4-bit floats, or make every floating point operation noisy. This model only works if you use double-precision floating point.
wizzwizz4|12 hours ago