top | item 42396809

(no title)

go_prodev | 1 year ago

GELU really is like magic:

UNARY(GELU, b / 2 * (1 + tanh(.7978845 * (b + .044715 * b * b * b))))

discuss

order

jey|1 year ago

This is just a practical approximation to the actual mathematical definition of GELU, which is `GELU(x) := x * Φ(x)` where Φ(x) is the CDF of the Gaussian distribution.

fluoridation|1 year ago

Isn't that just erf()?

dogboat|1 year ago

Fast inverse square root lookalike.

shoo|1 year ago

You can hand that GELU definition to a mathematician and they can interpret it as a function of a real number b. The definition does not depend upon b being a floating-point number with a particular bit representation.

In contrast, the fast inverse square root really exploits the bit representation of a floating point input to cheaply compute an initial guess.