top | item 39875276

(no title)

Greenpants | 1 year ago

You may be interested in the "binary step" activation function. This does what you're suggesting. In general, complex behaviour really takes a hit though using this for the activation function of a neuron (though I'm also not sure which papers show metrics on this being used for transformer models).

discuss

order

No comments yet.