Also strawberry spelling isn't any real test for current LLMs as they have no concept of letters, they work on tokens which may be several characters including punctuation and numerals. To have any hope of getting that question right tokens would have to have the granularity of individual letters, massively ballooning model size and training time, or the LLM needs to be able to call out to an external tool that will return the result (and needs sufficient examples in the training data to prime that trigger to fire).
thatjoeoverthr|5 days ago