top | item 35768161

(no title)

imagainstit | 2 years ago

It's been very incremental and spread out.

Another commenter pointed to "Attention is all you Need" as particularly breakthrough. Even that paper just merged at the time current trends in attention, normalization, seq2seq and doesn't yet start to have all the interesting empirical results were later found using that architecture to scale up other problems.

The papers with huge numbers of authors tend to be empirical results with a large element of systems/engineering, much large scale LLM work is like this.

discuss

order

No comments yet.