(no title)
ipunchghosts | 9 months ago
I find it rather interesting that the structured representations go from sparse to full to sparse as a function of layer depth. I have noticed that applying weight decay penalty as an exponential function of layer depth gives improved results over using a global weight decay.
No comments yet.