top | item 21545872 (no title) Hawkenfall | 6 years ago A more in-depth paper about this found the Swish activation often outperformed other functions: https://arxiv.org/abs/1710.05941 discuss order hn newest osipov|6 years ago Most of the recent research is moving to GELU (Gaussian Error Linear Units) activation functions: https://arxiv.org/pdf/1606.08415.pdf excessive|6 years ago That's interesting. I didn't read the paper closely, but skipping to the pictures, it looks like ReLU, but smoothed out so the derivative is continuous. Intuitively, that seems useful. rickdeveloper|6 years ago I wasn’t aware of that one. Definitely interesting, thanks for sharing!
osipov|6 years ago Most of the recent research is moving to GELU (Gaussian Error Linear Units) activation functions: https://arxiv.org/pdf/1606.08415.pdf
excessive|6 years ago That's interesting. I didn't read the paper closely, but skipping to the pictures, it looks like ReLU, but smoothed out so the derivative is continuous. Intuitively, that seems useful.
osipov|6 years ago
excessive|6 years ago
rickdeveloper|6 years ago