This is just a practical approximation to the actual mathematical definition of GELU, which is `GELU(x) := x * Φ(x)` where Φ(x) is the CDF of the Gaussian distribution.
You can hand that GELU definition to a mathematician and they can interpret it as a function of a real number b. The definition does not depend upon b being a floating-point number with a particular bit representation.
In contrast, the fast inverse square root really exploits the bit representation of a floating point input to cheaply compute an initial guess.
jey|1 year ago
fluoridation|1 year ago
dogboat|1 year ago
shoo|1 year ago
In contrast, the fast inverse square root really exploits the bit representation of a floating point input to cheaply compute an initial guess.