top | item 47202794

(no title)

dnautics | 22 hours ago

symbol manipulation in transformers is fp arithmetic?

discuss

That's an implementation detail. The behaviour of trained transformer models remains similar even if you quantise them to 4-bit floats, or make every floating point operation noisy. This model only works if you use double-precision floating point.