GPT-3.5 is much worse at "complex" cognitive tasks than Davinci (175B), which seem to indicate that it's a smaller model. It's also much faster than Davinci and costs the same as Curie via the API.
It's clearly a smaller model, but I'm very skeptical that it is 13B. It is much more lucid than any 13B model out in the wild. I find it much more likely that they used additional tricks to scale down hardware requirements and thereby bring the price down so much (int4 quantization, perhaps? that alone would mean 4x less hardware utilization for the same query, if they were using float16 for older models, which they probably were)
I'm sure they're tweaking lots of things under the hood, especially now that they have 100M+ users. It could be bigger (30B?, maybe 65B) as coming down from 175B gives quite a lot of room, but the cognitive drop from Davinci gives away that's it's much smaller.
People fine-tuning LLaMa models on arguably not that much/not the highest quality data are already seeing pretty good improvements over the base LLaMa, even at "small" sizes (7B/13B). I assume OpenAI has access to much higher quality data to fine-tune with and in much higher quantity too.
int_19h|2 years ago
iliane5|2 years ago
People fine-tuning LLaMa models on arguably not that much/not the highest quality data are already seeing pretty good improvements over the base LLaMa, even at "small" sizes (7B/13B). I assume OpenAI has access to much higher quality data to fine-tune with and in much higher quantity too.