top | item 45716892

(no title)

photonthug | 4 months ago

Turing test isn't actually a good test of much, but even so, we're not there yet. Anyone that thinks we've passed it already should experiment a bit a with counter-factuals.

Ask your favorite SOTA model to assume something absurd and then draw the next logical conclusions based on that. "Green is yellow and yellow is green. What color is a banana?" They may get the first question(s) right, but will trip up within a few exchanges. Might be a new question, but often they are very happy to just completely contradict their own previous answers.

You could argue that this is hitting alignment and guard-rails against misinformation.. but whatever the cause, it's a clear sign it's a machine and look, no em-dashes. Ironically it's also a failure of the turing test that arises from a failure in reasoning at a really basic level, which I would not have expected. Makes you wonder about the secret sauce for winning IMO competitions. Anyway, unlike other linguistic puzzles that attempt to baffle with ambiguous reference or similar, simple counterfactuals with something like colors are particular interesting because they would NOT trip up most ESL students or 3-5 year olds.

discuss

order

runarberg|4 months ago

I never really understood what made the Turing test so special. On the face of it, it is a rather dumb test. And it was debunked within two decades by the Chinese room thought experiment. And now that we have language models that are obviously not intelligent, it should have been the last nail in the coffin for this test.

Alan Turing was a mathematician not a psychologist, this was his attempt of doing philosophy. And while I applaud brilliant thinkers when they attempt to do philosophy (honestly we need more of that) it is better to leave it to actual philosophers to validate the quality of said philosophy. John Searle was a philosopher which specialized in questions of psychology. And in 1980 he pretty convincingly argued against the Turning test.

KalMann|4 months ago

A funny thing is even though we're pretty good at a text-based turing test, and we can have very convincing human generated speech we still don't have something that can pass the audio based turing test. Natural pausing and back and forth gives the AI away.

photonthug|4 months ago

Yeah, I mean I hope there are not many people that still think it's a super meaningful test in the sense originally proposed. And yet it is testing something. Even supposing it were completely solved and further supposing the solution is theoretically worthless and only powers next-gen slop-creation, then people would move on to looking for a minimal solution, and perhaps that would start getting interesting. People just like moving towards concrete goals.

In the end though, it's probably about as good as any single kind of test could be, hence TFA looking to combine hundreds across several dozen categories. Language was a decent idea if you're looking for that exemplar of the "AGI-Complete" class for computational complexity, vision was at one point another guess. More than anything else I think we've figured out in recent years that it's going to be hard to find a problem-criteria that's clean and simple, much less a solution that is

rmunn|4 months ago

I would argue that current LLMs are passing the Turing test because many observers have a hard time distinguishing them from humans: just look at the difficulty many schools have in enforcing rules like "Not allowed to use LLMs for your homework". The teachers often (not always) can't tell, looking at a piece of text, whether a human produced it or whether ChatGPT or some other LLM produced it.

And that "not always" is the crux of the matter, I think. You are arguing that we're not there yet, because there are lines of questioning you can apply that will trip up an LLM and demonstrate that it's not a human. And that's probably a more accurate definition of the test, because Turing predicted that by 2000 or so (he wrote "within 50 years" around 1950) chatbots would be good enough "that an average interrogator will not have more than 70% chance of making the right identification after five minutes of questioning". He was off by about two decades, but by now that's probably happened. The average interrogator probably wouldn't come up with your (good) strategy of using counterfactuals to trick the LLM, and I would argue two points: 1) that the average interrogator would indeed fail the Turing test (I've long argued that the Turing test isn't one that machines can pass, it's one that humans can fail) because they would likely stick to conventional topics on which the LLM has lots of data, and 2) that the situation where people are actually struggling to distinguish LLMs is one where they don't have an opportunity to interrogate the model: they're looking at one piece of multi-paragraph (usually multi-page) output presented to them, and having to guess whether it was produced by a human (who is therefore not cheating) or by an LLM (in which case the student is cheating because the school has a rule against it). That may not be Turing's actual test, but it's the practical "Turing test" that applies the most today.

photonthug|4 months ago

I think the TT has to be understood as explicitly adversarial, and increasingly related to security topics, like interactive proof and side channels. (Looking for guard-rails is just one kind of information leakage, but there's lots of information available in timing too.)

If you understand TT to be about tricking the unwary, in what's supposed to be a trusting and non-adversarial context, and without any open-ended interaction, then it's correct to point out homework-cheating as an example. But in that case TT was solved shortly after the invention of spam. No LLMs needed, just markov models are fine.

rmunn|4 months ago

I've seen one blog where in order to comment, you have to answer the question "What color is a green orange?" I presume he gets spambots which all answer "orange" and get blocked from commenting, or more commonly which leave that input blank (since it's a non-standard input field) and also get blocked.

But it just occurred to me that you could ask "what color is a green eggplant?" for a similar result. Though you'd catch a lot of people who, like me before I looked up fruits that aren't green when unripe, assume that all unripe fruits are green. (They aren't, unripe eggplants are white and look like eggs, hence the name).

p1esk|4 months ago

Me:

What color is a green eggplant?

ChatGPT:

A green eggplant (also called Thai eggplant or green brinjal, depending on the variety) is actually green on the outside — sometimes solid green, sometimes streaked with white or pale green.

The inside flesh is typically off-white or light cream, just like purple eggplants.

If you mean a green-colored variety (not an unripe purple one), then the skin color can range from pale mint green to deep jade, often with light striping.

Me:

How would an average human answer this question?

ChatGPT:

An average person would probably answer something simple and direct like:

“It’s green.”

Or if they’re being a bit more specific:

“It’s green on the outside and white inside.”

Most people wouldn’t overthink the variety or mention Thai vs. unripe types — they’d just describe what they see.