(no title)
qmatch
|
11 months ago
Need to read the details, but removing the norm can be big. It’s always a pain to make sure that your network is normalized properly when trying new architectures. Likely there will still be other implications of the tanh, since the norm is sometimes solving a conditioning problem, but IMO more alternatives are welcome
No comments yet.