top | item 40812930

(no title)

elwell | 1 year ago

> All these LLMs make up too much stuff, I don't see how that can be fixed.

All these humans make up too much stuff, I don't see how that can be fixed.

discuss

I know you’re trying to be edgy here, but if I was deciding between searching online and finding a source vs trying to shortcut and use GPT, but GPT decides to hallucinate and make something up - that’s the deceiving part.

The biggest issue is how confidently wrong GPT enjoys being. You can press GPT in either right or wrong direction and it will concede with minimal effort, which is also an issue. It’s just really bad russian roulette nerdspining until someone gets tired.

sva_|1 year ago

I wouldn't call it deceiving. In order to be motivated to deceive someone, you'd need agency and some benefit out of it

advael|1 year ago

The problems of epistemology and informational quality control are complicated, but humanity has developed a decent amount of social and procedural technology to do these, some of which has defined the organization of various institutions. The mere presence of LLMs doesn't fundamentally change how we should calibrate our beliefs or verify information. However, the mythology/marketing that LLMs are "outperforming humans" combined with the fact that the most popular ones are black boxes to the overwhelming majority of their users means that a lot of people aren't applying those tools to their outputs. As a technology, they're much more useful if you treat them with what is roughly the appropriate level of skepticism for a human stranger you're talking to on the street

mistermann|1 year ago

I wonder what ChatGPT would have to say if I ran this text through with a specialized prompt. Your choice of words is interesting, almost like you are optimizing for persuasion, but simultaneously I get a strong vibe of intention of optimizing for truth.

swatcoder|1 year ago

In reality, humans are often blunt and rude pessimists who say things can't be done. But "helpful chatbot" LLM's are specifically trained not to do that for anything but crude swaths of political/social/safety alignment.

When it comes to technical details, current LLM's have a bias towards sycophancy and bullshitting that humans only show when especially desperate to impress or totally fearful.

Humans make mistakes too, but the distribution of those mistakes is wildly different and generally much easier to calibrate for and work around.

urduntupu|1 year ago

Exactly, you can't even fix the problem at the root, b/c the problem is already with the humans, making up stuff.

testfrequency|1 year ago

Believe it or not, there are websites that have real things posted. This is honestly my biggest shock that OpenAI thought Reddit of all places is a trustworthy source for knowledge.

CooCooCaCha|1 year ago

If I am going to trust a machine then it should perform at the level of a very competent human, not a general human.

Why would I want to ask your average person a physics question? Of course, their answer will probably be wrong and partly made up. Why should that be the bar?

I want it to answer at the level of a physics expert. And a physics expert is far less likely to make basic mistakes.

nonameiguess|1 year ago

advael's answer was fine, but since people seem to be hung up on the wording, a more direct response:

We have human institutions dedicated at least nominally to finding and publishing truth (I hate having to qualify this, but Hacker News is so cynical and post-modernist at this point that I don't know what else to do). These include, for instance, court systems. These include a notion of evidentiary standards. Eyewitnesses are treated as more reliable than hearsay. Written or taped recordings are more reliable than both. Multiple witnesses who agree are more reliable than one. Another example is science. Science utilizes peer review, along with its own notion of hierarchy of evidence, similar to but separate from the court's. Interventional trials are better evidence than observational studies. Randomization and statistical testing is used to try and tease out effects from noise. Results that replicate are more reliable than a single study. Journalism is yet another example. This is probably the arena in which Hacker News is most cynical and will declare all of it is useless trash, but nonetheless reputable news organizations do have methods they use to try and be correct more often than they are not. They employ their own fact checkers. They seek out multiple expert sources. They send journalists directly to a scene to bear witness themselves to events as they unfold.

You're free to think this isn't sufficient, but this is how we deal with humans making up stuff and it's gotten us modern civilization at least, full of warts but also full of wonders, seemingly because we're actually right about a lot of stuff.

At some point, something analogous will presumably be the answer for how LLMs deal with this, too. The training will have to be changed to make the system aware of quality of evidence. Place greater trust in direct sensor output versus reading something online. Place greater trust in what you read from a reputable academic journal versus a Tweet. Etc. As it stands now, unlike human learners, the objective function of an LLM is just to produce a string in which each piece is in some reasonably high-density region of the probability distribution of possible next pieces as observed from historical recorded text. Luckily, producing strings in this way happens to generate a whole lot of true statements, but it does not have truth as an explicit goal and, until it does, we shouldn't forget that. Treat it with the treatment it deserves, as if some human savant with perfect recall had never left a dark room to experience the outside world, but had read everything ever written, unfortunately without any understanding of the difference between reading a textbook and reading 4chan.