top | item 45339247

(no title)

bgavran | 5 months ago

This is an interesting writeup, I wonder if the authors considered a categorical approach to representation of general applicative arrays (which might be tree-shaped), as described here (https://www.cs.ox.ac.uk/people/jeremy.gibbons/publications/a...) or here (https://github.com/bgavran/TensorType)

discuss

order

godelski|5 months ago

FYI attention wasn't originally purposed on AIAYN. Their main contribution was a fully transformer based network.

You could argue they didn't invent dot product attention nor transformers but they definitely formalized those so I'll leave that nitpicking to Schmidhuber lol. But the other stuff, they say just as much in the paper. It's easy to pass credit over to the ones who popularized a technique rather than the many people who developed it.