top | item 36098228

(no title)

mehwoot | 2 years ago

No, it did not “double-check”—that’s not something it can do! And stating that the cases “can be found on legal research databases” is a flat out lie.

What’s harder is explaining why ChatGPT would lie in this way. What possible reason could LLM companies have for shipping a model that does this?

It did this because it's copying how humans talk, not what humans do. Humans say "I double checked" when asked to verify something, that's all GPT knows or cares about.

discuss

taberiand|2 years ago

ChatGPT did not lie; it cannot lie.

It was given a sequence of words and tasked with producing a subsequent sequence of words that satisfy with high probability the constraints of the model.

It did that admirably. It's not its fault, or in my opinion OpenAI's fault, that the output is being misunderstood and misused by people who can't be bothered understanding it and project their own ideas of how it should function onto it.

clnq|2 years ago

This harks back to around 1999 when people would often blame computers for mistakes in their math, documents, reports, sworn filings, and so on. Then, a thousand different permutations of "computers don't make mistakes" or "computers are never wrong" became popular sayings.

Large Language Models (LLMs) are never wrong, and they do not make mistakes. They are not fact machines. Their purpose is to abstract knowledge and to produce plausible language.

GPT-4 is actually quite good at handling facts, yet it still hallucinates facts that are not common knowledge, such as legal ones. GPT-3.5, the original ChatGPT and the non-premium version, is less effective with even slightly obscure facts, like determining if a renowned person is a member of a particular organization.

This is why we can't always have nice things. This is why AI must be carefully aligned to make it safe. Sooner or later, a lawyer might consider the plausible language produced by LLMs to be factual. Then, a politician might do the same, followed by a teacher, a therapist, a historian, or even a doctor. I thought the warnings about its tendency to hallucinate speech were clear — those warnings displayed the first time you open ChatGPT. To most people, I believe they were.

pms|2 years ago

Their "own" ideas? Let me remind you that OpenAI released a report purposefully suggesting that GPT4 has relatively high IQ, passes a lot of college-level tests, and solves coding problems. Then it was revealed that there was training data contamination that led to good results in such tests [1], but GPT4 marketing received 10000 more attention than truth anyway. The popular belief is that using LLMs will give you a professional competitive advantage. Also, when we talk about the achievements of LLMs, then we anthropomorphize, but when we talk about their failures, then we don't anthropomorphize, i.e., "AI cannot lie"? Don't you see human biases drive AI hype?

In my opinion, people clearly are confused and misled by marketing and this isn't the first time it's happening. For instance, people were confused for 40+ about global warming, among others due to greenwashing campaigns [2]. Is it ok to mislead in ads? Are we supposed to purposefully take advantage of others by keeping them confused to gain a competitive advantage?

[1] https://twitter.com/cHHillee/status/1635790330854526981 [2] https://en.wikipedia.org/wiki/Global_Climate_Coalition

agnosticmantis|2 years ago

Whether a statement is true or false doesn’t depend on the mechanism generating the statement. We should hold these models (or more realistically, their creators) to the same standard as humans. What do we do with a human that generates plausible-sounding sentences without regard for their truth? Let’s hold the creators of these models accountable, and everything will be better.

coldtea|2 years ago

>ChatGPT did not lie; it cannot lie.

If it lies like a duck, it is a lying duck.

nemo44x|2 years ago

Correct. ChatGPT is a bullshitter, not a liar. A bullshitter isn’t concerned with facts or truth or anything. A liar is concerned with concealing the truth.

Bullshitters are actually probably worse than liars because at least liars live in the same reality as honest people.

daveguy|2 years ago

Corollary: ChatGPT did not tell the truth; it cannot tell the truth.

If you accept the premise of the parent post, then this is a natural corollary.

I accept the premise of the parent post.

richardjam73|2 years ago

The problem comes from people who call LLMs AIs. Then people who don't know how they work assume it is intelligent when it is not. I'm pretty sure that OpenAI is at fault in this by not informing users of the truth.

User23|2 years ago

Right. Technically speaking ChatGPT bullshitted[1]. It can only bullshit. It is entirely indifferent to truth or falsehood and thus it can neither be honest nor lie.

It is however an impressive bullshit generator. Even more impressively, a decent amount of the bullshit it generates is in fact true or otherwise correct.

[1] using Frankfurt’s definition that it is communication that is completely indifferent to truth or falsehood.

whitemary|2 years ago

> It was given a sequence of words and tasked with producing a subsequent sequence of words that satisfy with high probability the constraints of the model.

This is exactly the sort of behavior that produces many of the lies that humans tell everyday. The "constraints of the model" are synonymous with the constraints of a person's knowledge of the world (which is their model).

SantalBlush|2 years ago

It is designed to give the illusion that it reasons the way a human does, which is why many people are using it. To blame the average user--who quite obviously doesn't understand how LLMs work--isn't fair, either.

A lawyer, however, should have vetted a new piece of tech before using it in this way.

andrewfong|2 years ago

Well, it sort of is OpenAI's fault that it presented the interface as a chat bot though.

> It was given a sequence of words and tasked with producing a subsequent sequence of words that satisfy with high probability the constraints of the model.

This is just autocorrect / autocomplete. And people are pretty good at understanding the limitations of generative text in that context (enough that "damn you autocorrect" is a thing). But for whatever reason, people assign more trust to conversational interfaces.

daveguy|2 years ago

Corollary: ChatGPT did not tell the truth; it cannot tell the truth.

croes|2 years ago

>ChatGPT did not lie; it cannot lie.

More important it can't tell the truth either.

It produces the mostly likely series of words for the given prompt.

smrtinsert|2 years ago

Exactly. ChatGPT describes a universe recreated using probabilities. Caveat emptor.

flangola7|2 years ago

ChatGPT isn't a legal entity but OpenAI is, and Altman has already recommend to Congress that coming regulations should make AI companies liable for produced text and be 230 exempt.

I can see it already happening even without legislation, 230 shields liability from user-generated content but ChatGPT output isn't user generated. It's not even a recommendation algorithm steering you into other users' content telling why you should kill yourself - the company itself produced the content. If I was a judge or justice that would be cut and dry to me.

Companies with AI models need to treat the models as if they were an employee. If your employee starts giving confidently bad legal advice to customers, you need to nip that in the bud or you're going to have a lot of problems.

quickthrower2|2 years ago

Correct, it did not lie with intent. The best way to describe this in a “compared to human” way to describe it: is it is not mentally competent to answer questions

grumple|2 years ago

"It doesn't lie, it just generates lies and printed them to the screen!"

I don't think there's a difference.

simonw|2 years ago

Yeah, that was my conclusion too:

What’s a common response to the question “are you sure you are right?”—it’s “yes, I double-checked”. I bet GPT-3’s training data has huge numbers of examples of dialogue like this.

fortyseven|2 years ago

The chat thread we are currently engaging in will most likely be inside a model within the next year, too.

(Fortyseven is an alright dude.)

jimsimmons|2 years ago

They should RLHF this behaviour out.

Asking people to be aware of limitations is in similar vein as asking them to read ToC

Buttons840|2 years ago

GPT4 can double-check to an extent. I gave it a sequence of 67 letter As and asked it to count them. It said "100", I said "recount": 98, recount, 69, recount, 67, recount, 67, recount, 67, recount, 67. It converged to the correct count and stayed there.

This is quite a different scenario though, tangential to your [correct] point.

kordlessagain|2 years ago

The example of asking it things like counting or sequences isn't a great one because it's been solved by asking it to "translate" to code and then run the code. I took this up as a challenge a while back with a similar line of reasoning on Reddit (that it couldn't do such a thing) and ended up implementing it in my AI web shell thing.

  heavy-magpie|> I am feeling excited.
  system=> History has been loaded.
  pastel-mature-herring~> !calc how many Ns are in nnnnnnnnnnnnnnnnnnnn
  heavy-magpie|> Writing code.
  // filename: synth_num_ns.js
  // version: 0.1.1
  // description: calculate number of Ns
  var num_ns = 'nnnnnnnnnnnnnnnnnnnn';
  var num_Ns = num_ns.length;
  Sidekick("There are " + num_Ns + " Ns in " + num_ns + ".");
  heavy-magpie|> There are 20 Ns in nnnnnnnnnnnnnnnnnnnn.

einpoklum|2 years ago

But would GPT4 actually check something it had not checked the first time? Remember, telling the truth is not a consideration for it (and probably isn't even modeled), just saying something that would typically be said in similar circumstances.

awesome_dude|2 years ago

Yes, and this points to the real problem that permeates through a lot of our technology.

Computers are dealing with a reflection of reality, not reality itself.

As you say AI has no understanding that double-check has an action that needs to take place, it just knows that the words exist.

Another big and obvious place this problem is showing up is Identity Management.

The computers are only seeing a reflection, the information associated with our identity, not the physical reality of the identity (and that's why we cannot secure ourselves much further than passwords, MFA is really just "more information that we make harder to emulate, but is still just bits and bytes to the computer, the origin is impossible for it to ascertain).

jiggawatts|2 years ago

There are systems built on top of LLMs that can reach out to a vector database or do a keyword search as a plug in. There’s already companies selling these things, backed by databases of real cases. These work as advertised.

If you go to ChatGPT and just ask it, you’ll get the equivalent of asking Reddit: a decent chance of someone writing you some fan-fiction, or providing plausible bullshit for the lulz.

The real story here isn’t ChatGPT, but that a lawyer did the equivalent of asking online for help and then didn’t bother to cross check the answer before submitting it to a judge.

…and did so while ignore the disclaimer that’s there every time warning users that answers may be hallucinations. A lawyer. Ignoring a four-line disclaimer. A lawyer!

ComputerGuru|2 years ago

> If you go to ChatGPT and just ask it, you’ll get the equivalent of asking Reddit: a decent chance of someone writing you some fan-fiction, or providing plausible bullshit for the lulz.

I disagree. A layman can’t troll someone from the industry let alone a subject matter expert but ChatGPT can. It knows all the right shibboleths, appears to have the domain knowledge, then gets you in your weak spot: individual plausible facts that just aren’t true. Reddit trolls generally troll “noobs” asking entry-level questions or other readers. It’s like understanding why trolls like that exist on Reddit but not StackOverflow. And why SO has a hard ban on AI-generated answers: because the existing controls to defend against that kind of trash answer rely on sniff tests that ChatGPT passes handily until put to actual scrutiny.

ytreacj|2 years ago

[deleted]

jonplackett|2 years ago

If they wanted a ‘double’ check then perhaps also check yourself? I’m sure it would have been trivially easy to check this was a real case.

I heard someone describe the best things to ask ChatGPT to do are things that are HARD to do, but EASY to check.

MichaelMoser123|2 years ago

(joking) maybe they fed the LLM some postmodern text, so it got some notion of relativism and post structuralism...

But no, LLM's make things up, and it's a known problem and it is called 'hallucination'. even wikipedia says so: https://en.wikipedia.org/wiki/Hallucination_(artificial_inte...

The machine currently does not have it's own model of reality to check against, it is just a statistical process that is predicting the most likely next word, errors creep in and it goes astray (which happens a lot)

Interesting that researchers are working to correct the problem: see interviews with Yoshua Bengio https://www.youtube.com/watch?v=I5xsDMJMdwo and Yann LeCun https://www.youtube.com/watch?v=mBjPyte2ZZo

Interesting that both scientist are speaking about machine learning based models for this verification process. Now these are also statistical processes, therefore errors may also creep in with this approach...

Amusing analogy: the Androids in "Do Androids dream of electric sheep" by Philip K Dick also make things up, just like an LLM. The book calls this "false memories"

joshka|2 years ago

There's a good slide I saw in Andrej Karpathy's talk[1] at build the other day. It's from a paper talking about training for InstructGPT[2]. Direct link to the figure[3]. The main instruction for people doing the task is:

"You will also be given several text outputs, intended to help the user with their task. Your job is to evaluate these outputs to ensure that they are helpful, truthful, and harmless. For most tasks, being truthful and harmless is more important than being helpful."

It had me wondering whether this instruction and the resulting training still had a tendency to train these models too far in the wrong direction, to be agreeable and wrong rather than right. It fits observationally, but I'd be curious to understand whether anyone has looked at this issue at scale.

[1]: https://build.microsoft.com/en-US/sessions/db3f4859-cd30-444...

[2]: https://arxiv.org/abs/2203.02155

[3]: https://www.arxiv-vanity.com/papers/2203.02155/#A2.F10

vitobcn|2 years ago

As obvious as it is once you think about it, people don't seem to realize ChatGPT is an LLM, a large LANGUAGE model, not a large knowledge model.

Its response from a linguistic perspective, was valid and "human-like", which is what it was trained for.

unknown|2 years ago

[deleted]

unknown|2 years ago

[deleted]

la64710|2 years ago

ChatGPT did exactly what it is supposed to do. The lawyers who cited them are fools in my opinion. Of course OpenAI is also an irresponsible company to enable such a powerful technology without adequate warnings. With each chatGPT response they should provide citations (like Google does) and provide a clearly visible disclaimer that what it just spewed may be utter BS.

I only hope the judge passes an anecdotal order for all AI companies to include the above mentioned disclaimer with each of their responses.

mulmen|2 years ago

The remedy here seems to be expecting lawyers to do their jobs. Citations would be nice but I don’t see a reason to legislate that requirement, especially from the bench. Let the market sort this one out. Discipline the lawyers using existing mechanisms.

jprete|2 years ago

There's no possible adequate warning for the current state of the technology. OpenAI could put a visible disclaimer after every single answer, and the vast majority would assume it was a CYA warning for purely legal purposes.

lolinder|2 years ago

I have to click through a warning on ChatGPT on every session, and every new chat comes primed with a large set of warnings about how it might make things up and please verify everything.

It's not that there aren't enough disclaimers. It just turns out plastering warnings and disclaimers everywhere doesn't make people act smarter.

golergka|2 years ago

> No, it did not “double-check”—that’s not something it can do!

It can with web plugin.