I know you’re trying to be edgy here, but if I was deciding between searching online and finding a source vs trying to shortcut and use GPT, but GPT decides to hallucinate and make something up - that’s the deceiving part.
The biggest issue is how confidently wrong GPT enjoys being. You can press GPT in either right or wrong direction and it will concede with minimal effort, which is also an issue. It’s just really bad russian roulette nerdspining until someone gets tired.
The problems of epistemology and informational quality control are complicated, but humanity has developed a decent amount of social and procedural technology to do these, some of which has defined the organization of various institutions. The mere presence of LLMs doesn't fundamentally change how we should calibrate our beliefs or verify information. However, the mythology/marketing that LLMs are "outperforming humans" combined with the fact that the most popular ones are black boxes to the overwhelming majority of their users means that a lot of people aren't applying those tools to their outputs. As a technology, they're much more useful if you treat them with what is roughly the appropriate level of skepticism for a human stranger you're talking to on the street
I wonder what ChatGPT would have to say if I ran this text through with a specialized prompt. Your choice of words is interesting, almost like you are optimizing for persuasion, but simultaneously I get a strong vibe of intention of optimizing for truth.
In reality, humans are often blunt and rude pessimists who say things can't be done. But "helpful chatbot" LLM's are specifically trained not to do that for anything but crude swaths of political/social/safety alignment.
When it comes to technical details, current LLM's have a bias towards sycophancy and bullshitting that humans only show when especially desperate to impress or totally fearful.
Humans make mistakes too, but the distribution of those mistakes is wildly different and generally much easier to calibrate for and work around.
Believe it or not, there are websites that have real things posted. This is honestly my biggest shock that OpenAI thought Reddit of all places is a trustworthy source for knowledge.
If I am going to trust a machine then it should perform at the level of a very competent human, not a general human.
Why would I want to ask your average person a physics question? Of course, their answer will probably be wrong and partly made up. Why should that be the bar?
I want it to answer at the level of a physics expert. And a physics expert is far less likely to make basic mistakes.
advael's answer was fine, but since people seem to be hung up on the wording, a more direct response:
We have human institutions dedicated at least nominally to finding and publishing truth (I hate having to qualify this, but Hacker News is so cynical and post-modernist at this point that I don't know what else to do). These include, for instance, court systems. These include a notion of evidentiary standards. Eyewitnesses are treated as more reliable than hearsay. Written or taped recordings are more reliable than both. Multiple witnesses who agree are more reliable than one. Another example is science. Science utilizes peer review, along with its own notion of hierarchy of evidence, similar to but separate from the court's. Interventional trials are better evidence than observational studies. Randomization and statistical testing is used to try and tease out effects from noise. Results that replicate are more reliable than a single study. Journalism is yet another example. This is probably the arena in which Hacker News is most cynical and will declare all of it is useless trash, but nonetheless reputable news organizations do have methods they use to try and be correct more often than they are not. They employ their own fact checkers. They seek out multiple expert sources. They send journalists directly to a scene to bear witness themselves to events as they unfold.
You're free to think this isn't sufficient, but this is how we deal with humans making up stuff and it's gotten us modern civilization at least, full of warts but also full of wonders, seemingly because we're actually right about a lot of stuff.
At some point, something analogous will presumably be the answer for how LLMs deal with this, too. The training will have to be changed to make the system aware of quality of evidence. Place greater trust in direct sensor output versus reading something online. Place greater trust in what you read from a reputable academic journal versus a Tweet. Etc. As it stands now, unlike human learners, the objective function of an LLM is just to produce a string in which each piece is in some reasonably high-density region of the probability distribution of possible next pieces as observed from historical recorded text. Luckily, producing strings in this way happens to generate a whole lot of true statements, but it does not have truth as an explicit goal and, until it does, we shouldn't forget that. Treat it with the treatment it deserves, as if some human savant with perfect recall had never left a dark room to experience the outside world, but had read everything ever written, unfortunately without any understanding of the difference between reading a textbook and reading 4chan.
testfrequency|1 year ago
The biggest issue is how confidently wrong GPT enjoys being. You can press GPT in either right or wrong direction and it will concede with minimal effort, which is also an issue. It’s just really bad russian roulette nerdspining until someone gets tired.
sva_|1 year ago
advael|1 year ago
mistermann|1 year ago
swatcoder|1 year ago
When it comes to technical details, current LLM's have a bias towards sycophancy and bullshitting that humans only show when especially desperate to impress or totally fearful.
Humans make mistakes too, but the distribution of those mistakes is wildly different and generally much easier to calibrate for and work around.
urduntupu|1 year ago
testfrequency|1 year ago
CooCooCaCha|1 year ago
Why would I want to ask your average person a physics question? Of course, their answer will probably be wrong and partly made up. Why should that be the bar?
I want it to answer at the level of a physics expert. And a physics expert is far less likely to make basic mistakes.
nonameiguess|1 year ago
We have human institutions dedicated at least nominally to finding and publishing truth (I hate having to qualify this, but Hacker News is so cynical and post-modernist at this point that I don't know what else to do). These include, for instance, court systems. These include a notion of evidentiary standards. Eyewitnesses are treated as more reliable than hearsay. Written or taped recordings are more reliable than both. Multiple witnesses who agree are more reliable than one. Another example is science. Science utilizes peer review, along with its own notion of hierarchy of evidence, similar to but separate from the court's. Interventional trials are better evidence than observational studies. Randomization and statistical testing is used to try and tease out effects from noise. Results that replicate are more reliable than a single study. Journalism is yet another example. This is probably the arena in which Hacker News is most cynical and will declare all of it is useless trash, but nonetheless reputable news organizations do have methods they use to try and be correct more often than they are not. They employ their own fact checkers. They seek out multiple expert sources. They send journalists directly to a scene to bear witness themselves to events as they unfold.
You're free to think this isn't sufficient, but this is how we deal with humans making up stuff and it's gotten us modern civilization at least, full of warts but also full of wonders, seemingly because we're actually right about a lot of stuff.
At some point, something analogous will presumably be the answer for how LLMs deal with this, too. The training will have to be changed to make the system aware of quality of evidence. Place greater trust in direct sensor output versus reading something online. Place greater trust in what you read from a reputable academic journal versus a Tweet. Etc. As it stands now, unlike human learners, the objective function of an LLM is just to produce a string in which each piece is in some reasonably high-density region of the probability distribution of possible next pieces as observed from historical recorded text. Luckily, producing strings in this way happens to generate a whole lot of true statements, but it does not have truth as an explicit goal and, until it does, we shouldn't forget that. Treat it with the treatment it deserves, as if some human savant with perfect recall had never left a dark room to experience the outside world, but had read everything ever written, unfortunately without any understanding of the difference between reading a textbook and reading 4chan.