top | item 36853165

(no title)

joebiden2 | 2 years ago

You seem to really disregard the positions of this author. They seem to have invested substantial efforts in that specific area of research.

To validate the idea the author has, it would be required to train a LLM from zero. If the author is right, you would get similar results to the current generation of LLMs, but with (a lot) less space required for the intermediate layers.

The time to achieve that is still measured in kilo- to mega-dollars, why is it wrong to put that idea in the open to substantially criticize or adopt?

discuss

Legend2440|2 years ago

You don't need to train a ChatGPT-sized LLM, a toy nanoGPT would have been enough. You can train those on a consumer GPU in an afternoon.

And yes I do disregard his research effort. There are hundreds of well-justified and well-researched "clever tricks" for improving Transformers, and almost all of them don't work. I'll believe it when I see the results.

mikeravkine|2 years ago

Outliers only begin to appear around 3B parameters (as per the original LLM.int8 paper) so unfortunately not consumer GPU in an afternoon kinda stuff to prove you've managed to suppress them.

Yenrabbit|2 years ago

I tried to test this with nanoGPT in an afternoon, since the code change is pretty minimal. It's hard to get conclusive results at that scale though - to be able to say anything with confidence you'd need to run multiple tests, figure out if the 'outliers' mentioned only appear above a certain scale, find good tests for quantization performance that work on small enough models that you can iterate quickly ... It's doable but still lots of work, enough that putting out the idea and hoping others with more time+compute will try it out seems a valid strategy to me :) More generally though I definitely agree that the trend among 'improvements' to transformers has been things that don't turn out to work in practice.

knewter|2 years ago

Google used it in flaxformers since 2021 apparently

renewiltord|2 years ago

Do you know of handy testing steps? I suppose I could ask ChatGPT, but if someone has a validated "here, this is how you do it" I have a 3090 that I can do it on, but I'm not keen to debug anything here.