(no title)
vlovich123 | 1 day ago
Speculation is that the frontier models are all below 200B parameters but a 2x size difference wouldn’t fully explain task performance differences
vlovich123 | 1 day ago
Speculation is that the frontier models are all below 200B parameters but a 2x size difference wouldn’t fully explain task performance differences
nl|20 hours ago
Some versions of some the models are around that size, which you might hit for example with the ChatGPT auto-router.
But the frontier models are all over 1T parameters. Source: watch interview with people who have left one of the big three labs and now work at the Chinese labs and are talking about how to train 1T+ models.
BoredomIsFun|13 hours ago
NamlchakKhandro|21 hours ago
Yes it does.
MrDrMcCoy|18 hours ago
827a|20 hours ago
Core speed/count and memory bandwidth determines your performance. Memory size determines your model size which determines your smarts. Broadly speaking.
regularfry|8 hours ago
BoredomIsFun|13 hours ago
GLM-5 is ~750B model.
ses1984|1 day ago
unknown|23 hours ago
[deleted]