(no title)
joebiden2 | 2 years ago
To validate the idea the author has, it would be required to train a LLM from zero. If the author is right, you would get similar results to the current generation of LLMs, but with (a lot) less space required for the intermediate layers.
The time to achieve that is still measured in kilo- to mega-dollars, why is it wrong to put that idea in the open to substantially criticize or adopt?
Legend2440|2 years ago
And yes I do disregard his research effort. There are hundreds of well-justified and well-researched "clever tricks" for improving Transformers, and almost all of them don't work. I'll believe it when I see the results.
mikeravkine|2 years ago
Yenrabbit|2 years ago
knewter|2 years ago
renewiltord|2 years ago