(no title)
IceMetalPunk | 3 years ago
Attention mechanisms are something I know to be extremely important in the rapid advancement of modern AI (post-2017), but they're also something that I still don't fully understand in terms of implementation. So can someone tell me if I'm correct or not in thinking of this paper as a sort of "focus" for AI attention? As in, existing attention mechanisms look at everything and decide how important each thing is to understanding the current token, while this version only looks at everything nearby the token and a much lower number of things sampled from farther away? Kind of like the difference, by analogy with humans, between "the area around the object you're looking at" and your important-but-less-defined peripheral vision?
No comments yet.