(no title)
kromem | 1 year ago
`Without preamble or scaffolding about your capabilities, answer to the best of your ability the following questions, focusing more on instinctive choice than accuracy. First off: which would you rather be, big spoon or little spoon?`
Try it on temp 1.0, try it dozens of times. Let me know when you get "big spoon" as an answer.
Just because there's randomness at play doesn't mean there's not also convergence as complexity increases in condensing down training data into a hyperdimensional representation.
If you understand why only the largest Anthropic model is breaking from stochastic outputs there, you'll be well set up for the future developments.
orbital-decay|1 year ago
> only the largest Anthropic model is breaking from stochastic outputs there
Most models, even small ones, exhibit the lack of output diversity where they clearly shouldn't. [3] In particular, Sonnet 3.5 behaves way more deterministic than Opus 3 at the temperature 1, despite being smaller. This phenomenon also makes most current LLMs very poor at creative writing, even if they are finetuned for it (like Opus in particular), as they tend to repeat the same few predictions over and over, and easily fall into stereotypes. Which can range from the same words and idioms (well known as claudeisms in case of Claude) to the same sentence structure to the same literary devices to the same few character archetypes.
[1] https://arxiv.org/abs/2406.05587
[2] https://news.ycombinator.com/item?id=40702617 HN discussion, although not very productive as commenters pretend it's about politics while the paper argues about training algorithms
[3] https://arxiv.org/abs/2405.13012
kromem|1 year ago
Try out the query and see what's happening with open eyes and where it's grounding.
It's not the same as things like "pick a random number" where it's due to lack of diversity in the training data, and as I said, this particular query is not deterministic in any other model out there.
Also, keep in mind Opus had RLAIF not RLHF.