top | item 43835371

(no title)

s17n | 10 months ago

Geoguessing isn't much of a reasoning task, its more about memorizing a bunch of knowledge. Since LLMs contain essentially all knowledge, it's not surprising that they would be good at this.

As far as goalpost-moving goes, it's wild to me that nobody is talking about the turing test these days.

discuss

order

Macha|10 months ago

Obviously when the Turing Test was designed, the thought was that anything that could pass it would so obviously be clearly human-like that passing it would be a clear signal.

LLMs really made it clear that it's not so clear cut. And so the relevance of the test fell.

distortionfield|10 months ago

Because the Chinese Room is a much better analogy for what LLMs are doing inside than the Turing test is.

jibal|10 months ago

That's a non sequitur that mixes apples and giraffes, and is completely wrong about what happens in the Chinese Room and what happens in LLMs. Ex hypothesi, the "rule book" that the Searle homunculus in the Chinese Room uses is "the right sort of program" to implement "Strong AI". The LLM algorithm is very much not that sort of program, it's a statistical pattern matcher. Strong AI does symbolic reasoning, LLMs do not.

But worse, the Turing Test is not remotely intended to be an "analogy for what LLMs are doing inside" so your comparison makes no sense whatsoever, and completely fails to address the actual point--which is that, for ages the Turing Test was held out as the criterion for determining whether a system was "thinking", but that has been abandoned in the face of LLMs, which have near perfect language models and are able to closely model modes of human interaction regardless of whether they are "thinking" (and they aren't, so the TT is clearly an inadequate test, which some argued for decades before LLMs became a reality).

CamperBob2|10 months ago

What happens if we give the operator of the Chinese Room a nontrivial math problem, one that can't simply be answered with a symbolic lookup but requires the operator to proceed step-by-step on a path of inquiry that he doesn't even know he's taking?

The analogy I used in another thread is a third grader who finds a high school algebra book. She can read the book easily, but without access to teachers or background material that she can engage with -- consciously, literately, and interactively, unlike the Chinese Room operator -- she will not be able to answer the exercises in the book correctly, the way an LLM can.

zahlman|10 months ago

Look at contemporary accounts of what people thought a conversation with a Turing-test-passing machine would look like. It's clear they had something very different in mind.

Realizing problems with previous hypotheses about what might make a good test, is not the same thing as choosing a standard and then revising it when it's met.

s17n|10 months ago

I think any time a 50+ year old problem is solved, it should be considered a Big Deal, regardless of how the solution changes our understanding of the original problem.

bluefirebrand|10 months ago

> As far as goalpost-moving goes, it's wild to me that nobody is talking about the turing test these days

To be honest I am still not entirely convinced that current LLMs pass the turing test consistently, at least not with any reasonably skeptical tester

"Reasonably Skeptical Tester" is a bit of goalpost shifting, but... Let's be real here.

Most of these LLMs have way too much of a "customer service voice", it's not very conversational and I think it is fairly easy to identify, especially if you suspect they are an LLM and start to probe their behavior

Frankly, if the bar for passing the Turing Test is "it must fool some number of low intelligence gullible people" then we've had AI for decades, since people have been falling for scammy porno bots for a long time

jibal|10 months ago

One needs to be more than "reasonably skeptical" and merely not "low intelligence gullible" to be a competent TT judge--it requires skill, experience, and understanding an LLM's weak spots.

And the "customer service voice" you see is one that is intentionally programmed in by the vendors via baseline rules. They can be programmed differently--or overridden by appropriate prompts--to have a very different tone.

LLMs trained on trillions of human-generated text fragments available from the internet have shown that the TT is simply not an adequate test for identifying whether a machine is "thinking"--which was Turing's original intent in his 1950 paper "Computing Machinery and Intelligence" in which he introduced the test (which he called "the imitation game").

TimorousBestie|10 months ago

A lot happens in seventy-five years.

jibal|10 months ago

People were talking about the Turing Test as the criterion for whether a system was "thinking" up until the advent of LLMs, which was far less than 75 years ago.

sundarurfriend|10 months ago

> As far as goalpost-moving goes, it's wild to me that nobody is talking about the turing test these days.

UCSD: Large Language Models Pass the Turing Test https://news.ycombinator.com/item?id=43555248

From just a month ago.

s17n|10 months ago

Exactly - maybe the most significant long-term goal in computer science history has been achieved and it's barely discussed.

darkwater|10 months ago

> As far as goalpost-moving goes, it's wild to me that nobody is talking about the turing test these days.

Well, in this case humans has to be trained as well but now there are humans pretty good at detecting LLM slobs as well. (I'm half-joking and half-serious)