(no title)
ccccppppp | 6 years ago
> Attention is basically a memory module, so if you don't need that it's just a waste of compute resources.
But aren't CNNs also like a memory module (ie: they memorize how leopard skin looks like)? I guess attention is a more sophisticated kind of memory, "more dynamic" so to speak.
Anyway, I'm glad to hear that a transformer architecture isn't totally stupid for my task, I will look up the literature, there seems to be a bit on this matter.
hadsed|6 years ago