top | item 47195380

(no title)

casey2 | 1 day ago

It hasn't even passed the original turning test, depending on the question. There are an unlimited number of questions that cause LLMs to give inhuman looking answers.

As for writing in general slop score is still higher than a human baseline for all models[1], so all a human tester has to do is grade it and make the human write a bunch, the interrogator is allowed to submit an arbitrarily long list of questions.

[1] https://eqbench.com/slop-score.html

discuss

No comments yet.