(no title)
casey2 | 1 day ago
As for writing in general slop score is still higher than a human baseline for all models[1], so all a human tester has to do is grade it and make the human write a bunch, the interrogator is allowed to submit an arbitrarily long list of questions.
No comments yet.