For many of us a better Turing test is contextual to a topic we CARE about. Lots of LLMs sound better than a randomly sampled human on a topic I don't know too much about (e.g. opinions on new movies). They're decent on engineering topics I only vaguely know about, but still below the bar (though getting better!) on topics I really care about.
No comments yet.