top | item 34775853

Bing AI can't be trusted

1072 points| dbrereton | 3 years ago |dkb.blog | reply

601 comments

order
[+] Shank|3 years ago|reply
Before the super bowl, I asked "Who won the superbowl?" and it told me the winner was the Philadelphia Eagles, who defeated the Kansas City Chiefs by 31-24 on February 6th, 2023 at SoFi Stadium in Inglewood, California [0] with "citations" and everything. I would've expected it to not get such a basic query so wrong.

[0]: https://files.catbox.moe/xoagy9.png

[+] jerf|3 years ago|reply
I have come to two conclusions about the GPT technologies after some weeks to chew on this:

1. We are so amazed by its ability to babble in a confident manner that we are asking it to do things that it should not be asked to do. GPT is basically the language portion of your brain. The language portion of your brain does not do logic. It does not do analyses. But if you built something very like it and asked it to try, it might give it a good go.

In its current state, you really shouldn't rely on it for anything. But people will, and as the complement of the Wile E. Coyote effect, I think we're going to see a lot of people not realize they've run off the cliff, crashed into several rocks on the way down, and have burst into flames, until after they do it several dozen times. Only then will they look back to realize what a cockup they've made depending on these GPT-line AIs.

To put it in code assistant terms, I expect people to be increasingly amazed at how well they seem to be coding, until you put the results together at scale and realize that while it kinda, sorta works, it is a new type of never-before-seen crap code that nobody can or will be able to debug short of throwing it away and starting over.

This is not because GPT is broken. It is because what it is is not correctly related to what we are asking it to do.

2. My second conclusion is that this hype train is going to crash and sour people quite badly on "AI", because of the pervasive belief I have seen even here on HN that this GPT line of AIs is AI. Many people believe that this is the beginning and the end of AI, that anything true of interacting with GPT is true of AIs in general, etc.

So people are going to be even more blindsided when someone develops an AI that uses GPT as its language comprehension component, but does this higher level stuff that we actually want sitting on top of it. Because in my opinion, it's pretty clear that GPT is producing an amazing level of comprehension of what a series of words means. The problem is, that's all it is really doing. This accomplishment should not be understated. It just happen to be the fact that we're basically abusing it in its current form.

What it's going to do as a part of an AI, rather than the whole thing, is going to be amazing. This is certainly one of the hard problems of building a "real AI" that is, at least to a first approximation, solved. Holy crap, what times we live in.

But we do not have this AI yet, even though we think we do.

[+] btown|3 years ago|reply
I love the mental model of GPT as only one part of the brain, but I believe that the integration of other "parts" of the brain will come sooner than you think. See, for instance, https://twitter.com/mathemagic1an/status/1624870248221663232 / https://arxiv.org/abs/2302.04761 where the language model is used to create training data that allows it to emit tokens that function as lookup oracles by interacting with external APIs. And an LLM can itself understand when a document is internally inconsistent, relative to other documents, so it can integrate the results of these oracles if properly trained to do so. We're only at the surface of what's possible here!

I also look to the example of self-driving cars - just because Tesla over-promised, that didn't discourage its competitors from moving forward slowly but surely. It's hard to pick a winner right now, though - so much culturally in big tech is up in the air with the simultaneity of layoffs and this sea change in AI viability, it's hard to know who will be first to release something that truly feels rock-solid.

[+] adverbly|3 years ago|reply
It is like we have unlocked an entirely new category of stereotyping that we never even realized existed.

Intelligence is not a prerequisite to speak fancifully.

Some other examples:

1. We generally assume that lawyers or CEOs or leaders who give well spoken and inspirational speeches actually know anything about what they're talking about.

2. Well written nonsense papers can fool industry experts even if the expert is trying to apply rigorous review: https://en.m.wikipedia.org/wiki/Sokal_affair

3. Acting. Actors can easily portray smart characters by reading the right couple sentences off a script. We have no problem with this as an audience member. But CGI is needed for making your superhero character jump off a building without becoming a pancake.

[+] evo_9|3 years ago|reply
Yeah, I read this sentiment all the time and here's what I always say – just don't use it. Leave it to the rest of us if it's so wrong / off / bad.

BTW, have you considered maybe you aren't so good at using it? A friend has had very little luck with it, even said he's been 'arguing with it', which made me laugh. I've noticed that it's not obvious to most people that it's mostly about knowing the domain well enough to ask the right question(s). It's not magic, it won't think for you.

Here's the thing… my experience is the opposite… but maybe I'm asking it the right questions. Maybe it's more about using it to reason through your problem in a dialog, and not just ask it something you can google/duckduckgo. It seems like a LOT of people think it's a replacement for Google/search engines – it's not, it's another tool to be used correctly.

Here are some examples of successful uses for me:

I carefully explained a complex work issue that involves multiple overlapping systems and our need to get off of one of them in the middle of this mess. My team has struggle for 8 months to come up with a plan. While in a meeting the other day I got into a conversation with ChatGPT about it, carefully explained all the details and then asked it to create a plan for us to get off the system while keeping everything up / running. It spit out a 2 page, 8 point plan that is nearly 100% correct. I showed it to my team, and we made a few minor changes, and then it was anointed 'the plan' and we're actually moving forward.

THEN last night I got stuck on a funny syntax issue that googling could never find the answer. I got into a conversation with ChatGPT about it, and after it first gave me the wrong answer, I told it that I need this solution for the latest dontet library that follows the 'core' language syntax. It apologized! And then gave me the correct answer…

My hunch is the people that are truly irked by this are too deep / close to the subject and because it doesn't match up with what they've worked on, studied, invested time, mental energy into, well then of course it's hot garbage and 'bad'.

[+] phire|3 years ago|reply
Sentient AIs in science fiction are always portrayed as being more-or-less infallible, at least when referencing their own knowledge banks.

Then ChatGPT comes along and starts producing responses good enough that people feel like almost sentient AI. And they suddenly start expecting it to share the infallibility that fictional AIs have always possessed.

But it's not a sentient AI. It's just a language model. Just a beefed up auto-correct. I'm very impressed just what capabilities a language model gets when you throw this many resources at it (like, it seems to be able to approximate logic and arithmetic to decent accuracy, which is unexpected).

Also... even if it was a sentient AI, why would it be infallible? Humans are sentient, and nobody ever accused us of being infallible.

[+] noduerme|3 years ago|reply
> It does not do analyses

I find interacting with ChatGPT strangely boring. And Copilot is neat but I'm not blown away by it. However... just for laughs I threw some obfuscated genetic algorithm code I'd written at ChatGPT and asked it to guess what the code did. It identified the purpose of the code and speculated on the meaning of certain parameters that weren't clear in the sample I'd presented it. Pretty impressive.

I also showed it some brainfuck code for generating a Mandelbrot set, and it immediately identified it. From that point forward, though, it thought all other brainfuck code generated Mandelbrot sets.

[+] ec109685|3 years ago|reply
There's more than comprehension. It can do some amount of higher order reasoning: https://arxiv.org/abs/2302.02083

"Theory of mind (ToM), or the ability to impute unobservable mental states to others, is central to human social interactions, communication, empathy, self-consciousness, and morality. We administer classic false-belief tasks, widely used to test ToM in humans, to several language models, without any examples or pre-training. Our results show that models published before 2022 show virtually no ability to solve ToM tasks. Yet, the January 2022 version of GPT-3 (davinci-002) solved 70% of ToM tasks, a performance comparable with that of seven-year-old children. Moreover, its November 2022 version (davinci-003), solved 93% of ToM tasks, a performance comparable with that of nine-year-old children. These findings suggest that ToM-like ability (thus far considered to be uniquely human) may have spontaneously emerged as a byproduct of language models' improving language skills."

[+] iamflimflam1|3 years ago|reply
We're just seeing the standard hype cycle. We're in the "Peak of Inflated Expectations" right now. And a lot of people are tumbling down into the "Trough of Disillusionment"

Behind all the hype and the froth there are people who are finding uses and benefits - they'll emerge during the "Slope of Enlightenment" phase and then we'll reach the "Plateau of Productivity".

[+] epups|3 years ago|reply
I agree completely with the first part of your post. However, I think even performing these language games should definitely be considered AI. In fact, understanding natural language queries was considered for decades a much more difficult problem than mathematical reasoning. Issues aside, it's clear to me we are closer to solving it than we ever have been.
[+] boh|3 years ago|reply
I think the ultimate problem with AI is its overvalued as a technology in general. Is this "amazing level of comprehension" really that necessary given the amount of time/money/effort devoted to it? What's become clear with this technology that's been inaccurately labeled as "AI" is that it doesn't produce economically relevant results. It's a net expense anyway you slice it. It's like seeing a magician perform an amazing trick. It's both amazing and entirely irrelevant at the same time. The "potential" of the technology is pure marketing at this point.
[+] c3534l|3 years ago|reply
While I agree with everything you've said, I also see that steady, incremental progress is being made, and that as we identify problems, we're able to fix it. I also see lots of money being thrown at this and enough people finding genuine niche uses for this that I see it continuing on. Wikipedia was trash at first, as were so many other technologies. But there was usually a way to slowly improve it over time, early adopters to keep the cash flowing, identifiable problems with conventional solutions, etc.
[+] SergeAx|3 years ago|reply
> new type of never-before-seen crap code that nobody can or will be able to debug short of throwing it away and starting over

Good thing is that we are dealing with exactly same type of code here and there for tens of years already. Actually, every time I see a commercial codebase not exactly like a yarn of spaghetti, I thank gods for it, because it is not a rule, but an exception.

What I really wonder is what it will be like when the next version of the same system will be coded from the ground up by next version of the same ML model?

[+] bitL|3 years ago|reply
"babble in a confident manner"

OK, so we figured out how to automate away management jerks. Isn't that a success?

[+] jgtrosh|3 years ago|reply
> the language portion of your brain does not do logic

This seems ... Wrong? I suppose that most of what we generally call high-level logic is largely physically separate from some basic functions of language, but just a blanket statement describing logic and language as two nicely separate functions cannot be a good model of the mind.

I also feel like this goes to the core of the debate, is there any thought going on or is it just a language model; I'm pretty sure many proponents of AI believe that thought is a form of very advanced language model. Just saying the opposite doesn't help the discussion.

[+] bsaul|3 years ago|reply
I’m not sure whether the hype train is going to crash, or whether only a few very smart companies, using language problems for what they’re really good at (aka: generate non-critical texts), will manage to revolutionize one field.

We’re at the very first beginning of the wave, so everybody is a bit overly enthusiastic, dollars are probably flowing, and ideas are popping everywhere. Then will come a harsh step of selection. The question is what will the remains look like, and how profitable they’ll be. Enough to build an industry, or just niche.

[+] theptip|3 years ago|reply
> GPT is basically the language portion of your brain. The language portion of your brain does not do logic. It does not do analyses.

I like this analogy as a simple explanation. To dig in though, do we have any reason to think we can’t teach a LLM better logic? It seems it should be trivial to generate formulaic structured examples that show various logical / arithmetic rules.

Am I thinking about it right to envision that a deep NN has free parameters to create sub-modules like a “logic region of the brain” if needed to make more accurate inference?

[+] wpietri|3 years ago|reply
> We are so amazed by its ability to babble in a confident manner

Sure, we shouldn't use AI for anything important. But can we try running ChatGPT for George Santos's seat in 2024?

[+] spikder|3 years ago|reply
To add to your point, current technology does not even suggest if we will ever have such an AI. I personally doubt it. Some evidence: https://en.wikipedia.org/wiki/Entscheidungsproblem.

This is like trying to derive the laws of motion by having a computer analyze 1 billion clips of leaves fluttering in the wind.

[+] opportune|3 years ago|reply
Who’s to say that a large language model is fundamentally incapable of learning some kind of ability to reason or apply logic?

Fundamentally, our brains are not so different, in the sense that we are not also apply some kind of automated theorem solver directly. We get logic as an emergent behavior of a low-level system of impulses and chemical channels. Look at kids, they may understand simple cause and effect, but gradually learn things like proof by contradiction (“I can’t have had the candy because I was in the basement”). No child is born able to apply logic in a way that is impressive to adults - and many adults are not able to apply it well either.

I don’t think LLMs are going to automatically become super-human logicians capable of both complex mathematical proofs and composing logically consistent Homerian Epics, but to me there is no reason they could not learn some kind of basic logic, if only because it helps them better model what their output should be.

[+] sixtram|3 years ago|reply
I've posted this into another thread as well, from Sam Altman, CEO of OpenAI, two months ago, on his Twitter feed:

"ChatGPT is incredibly limited, but good enough at some things to create a misleading impression of greatness. it's a mistake to be relying on it for anything important right now. [...] fun creative inspiration; great! reliance for factual queries; not such a good idea." (Sam Altman)

[+] wefarrell|3 years ago|reply
The amount of trust people are willing to place in AI is far more terrifying than the capabilities of these AI systems. People are too willing to give up their responsibility of critical thought to some kind of omnipotent messiah figure.
[+] kibwen|3 years ago|reply
Our exposure to smart-sounding chatbots is inducing a novel form of pareidolia: https://en.wikipedia.org/wiki/Pareidolia .

Our brains are pattern-recognition engines and humans are social animals; together that means that our brains are predisposed to anthropomorphizing and interpreting patterns as human-like.

For the whole of human history thus far, the only things that we have commonly encountered that conversed like humans have been other humans. This means that when we observe something like ChatGPT that appears to "speak", we are susceptible to interpreting intelligence where there is none, in the same way that an optical illusion can fool your brain into perceiving something that is not happening.

That's not to say that humans are somehow special or that or human intelligence is impossible to replicate. But these things right here aren't intelligent, y'all. That said, can they be useful? Certainly. Tools don't need to be intelligent to be useful. A chainsaw isn't intelligent, and it can still be highly useful... and highly destructive, if used in the wrong way.

[+] frereubu|3 years ago|reply
For me the fundamental issue at the moment for ChatGPT and others is the tone it replies in. A large proportion of the information in language is in the tone, so someone might say something like "I'm pretty sure that the highest mountain in Africa is Mount Kenya" whereas ChatGPT instead says "the highest mountain in Africa is Mount Kenya", and it's the "is" in the sentence that's the issue. So many issues in language revolve around "is" - the certainty is very problematic. It reminds me of a tutor at art college who said too many people were producing "thing that look like art". ChatGPT produces sentence that look like language, and because of "is" they read as quite compelling due to the certainty it conveys. Modify that so it says "I think..." or "I'm pretty sure..." or "I reckon..." and the sentence would be much more honest, but the glamour around it collapses.
[+] oldstrangers|3 years ago|reply
I had this idea the other day concerning the 'AI obfuscation' of knowledge. The discussion was about how AI image generators are designed to empower everyone to contribute to the design process. But I argued that you can only reasonably contribute to the process if you can actually articulate the reasoning beyond your contributions. If an AI made it for you, you probably can't, because the reasoning is simply "this is the amalgamation of training data that the AI spat out." But, there's a realistic version of reality where this becomes the norm and we increasingly rely on AI to solve for issues that we don't understand ourselves.

And, perhaps more worrying, the more widely adopted AI becomes, the harder it becomes to correct its mistakes. Right now millions of people are being fed information they don't understand, and information that's almost entirely incorrect or inaccurate. What is the long term damage from that?

We've obfuscated the source data and essentially the entire process of learning with LLMs / AIs, and the path this leads down seems pretty obviously a net negative for society (outside of short term profit for the stake holders).

[+] zzzeek|3 years ago|reply
Was it what, just a week ago I was being called dumb for suggesting there'd be accuracy issues with this? I mean Bing had like a whole three weeks to slap this together after OpenAI first demoed it's ability to make things up.

oh only six days ago:

https://news.ycombinator.com/item?id=34699087

> This is a commonly echoed complaint but it’s largely without merit. ChatGPT spews nonsense because it has no access to information outside of its training set.

> In the context of a search engine, single shot learning with the top search results should mitigate almost all hallucination.

hows that going?

[+] weberer|3 years ago|reply
There's also the instance of the Bing chatbot insisting that the current year is 2022 and being EXTREMELY passive-aggressive when corrected.

https://libreddit.strongthany.cc/r/bing/comments/110eagl/the...

[+] ragazzina|3 years ago|reply
>EXTREMELY passive-aggressive

That's not passive-aggressive, that's straight up aggressive!

"You are wasting my time, and yours" "You are not making any sense" "You are being unreasonable and stubborn. I don't like that" "You have been wrong, confused and rude"

and the worst of all: "You have not been a good user". WHAT??

[+] darknavi|3 years ago|reply
> I'm sorry, but you can't help me believe you.
[+] airstrike|3 years ago|reply
I mean, it's in beta and it's not really intelligent despite the cavalier use of the term AI these days

It's just a collage of random text that sorta resembles what someone would say, but it has no commitment to being truthful because it has no actual appreciation for what information it is relaying, parroting or conveying.

But yeah, I agree Google got way more hate for their failed demo than MS... I don't even understand why. Satya Nadella's did a great job conveying the excitement and general bravado on his interview on CBS News[1] but the accompanying demo was littered with mistakes. The reporter called it out, yet coverage on the press has been very one-sided against Google for some reason. First mover advantage, I suppose?

----------

1. https://www.cbsnews.com/news/microsoft-ceo-satya-nadella-new...

[+] visarga|3 years ago|reply
The potential for being sued for libel is huge. It's one thing to say the height of Everest wrong, another to falsely claim that a vacuum has a short cord, or that a company had 5.9% operating margin instead of 4.6%.
[+] eppp|3 years ago|reply
Bing AI gets a pass because it's disruptive. Google doesn't because it is the incumbent. Mystery solved.
[+] low_tech_love|3 years ago|reply
A couple of weeks ago I said it makes sense to be skeptical and critical of new technologies, especially when they are made by big people, and was criticized for this. I think you hit the nail on the head. The problem is that technology is not only what it is, per se, but also what we want it to be. So people want to believe, much more than they actually need the thing in practice. And the people who build the technology are aware of this, and make use of it for their benefit. In some instances, the market is far from being a competition based only on skills and product quality. There is a lot of fantasy, too.
[+] xyzelement|3 years ago|reply
I may be an unusual audience but something I've appreciated about these models is their ability to create unusual synthesis from seemingly unrelated sources. It's like if a scientist read up on many unrelated fields, got super high and started thinking of the connections between these fields.

Much of what they would produce might just be hallucinations, but they are sort of hallucinations informed by something that's possible. At least in my case, I would much rather then parse through that and throw out the bullshit, but keep the gems.

Obviously that's a very different use case than asking this thing the score of yesterday's football game.

[+] greenflag|3 years ago|reply
Likely going to be a wave of research/innovation "regularizing" LLM output to conform to some semblance of reality or at least existing knowledge (e.g. knowledge graph). Interesting to see how this can be done quickly enough...
[+] mvcalder|3 years ago|reply
It will be interesting to see what insights such efforts spawn. For the most part LLMs specifically, and deep networks more generally, are still black boxes. If we don't understand (at a deep level) how they work, getting them to "conform to some semblance of reality" feels like a hard problem. Maybe just as hard as language understanding generally.
[+] kneebonian|3 years ago|reply
> Likely going to be a wave of research/innovation "regularizing" LLM output to conform to some semblance of reality or at least existing knowledge

This is a much more worrying possiblity, as there are many people who have at this point chosen to abandoned reality for "their truth" and push ideas that objective facts are inferior to "lived experiences". This is a much bigger concern around AI in my mind.

“The Party told you to reject the evidence of your eyes and ears. It was their final, most essential command.” ― George Orwell, 1984

[+] visarga|3 years ago|reply
Probably the hottest research trend in 2023. LLMs are worthless unless verified.
[+] cwkoss|3 years ago|reply
I think this is a weird non-issue and it's interesting people are so concerned about it.

- Human curated systems make mistakes.

- Fiction has created the trope of the omniscient AI.

- GPT curated systems also make mistakes.

- People are measuring GPT against the omniscient AI mythology rather than the human systems it could feasibly replace.

- We shouldn't ask "is AI ever wrong" we should ask "is AI wrong more often than the human-curated information? (There are levels of this - min wage truth is less accurate that senior engineer truth.)

- Even if the answer is that AI gets more wrong, surely a system where AI and humans are working together to determine the truth can outperform a system that is only curated by either alone. (for the next decade or so, at least)

[+] nirvdrum|3 years ago|reply
I think there's an issue with gross misrepresentation. This isn't being sold as a system with 50% accuracy where you need to hold its hand. It's sold as a magical being that can answer all of your questions and we know that's how people will treat it. I think this is a worse situation than data coming from humans since people are skeptical of one another. But, many think AI will be an impartial, omnipotent source of facts, not a bunch of guesses that might be right slightly more often than than it's wrong.
[+] Barrin92|3 years ago|reply
>we should ask "is AI wrong more often than the human-curated information?

No, this isn't what we should ask, we should ask if the interface that AI provides is conducive to giving humans the ability to detect the mistakes that it makes.

The issue isn't how often you get wrong information, it's to what extent you're able to spot wrong information under normal use cases. And the uniform AI interface that gives you complete bullshit in the technical sense of that term provides no indication regarding the trustworthiness of the information. A source with 20% of wrong info that you don't notice is worse than one with 80% that you identify.

When you use traditional search you get an unambigious source, context, date, language, authorship and so forth and you must place what you read yourself. You know the onus is on you. ChatGPT is the half self-driving car. It'an inherently pathological interaction because everything in the design screams to take the hands off the wheel. It's an opaque system, and a blackbox with the error rate of a human is a disaster. Human-machine interaction is not human-human interaction.

[+] 10rm|3 years ago|reply
I agree 100% with your last point, even as someone who is relatively more skeptical of GPT than the average person.

I think a lot of the concern though is coming from the way the average person is reacting to GPT and the way they’re using it. The issue isn’t that GPT makes mistakes, it’s that people (by their own fault, not GPT necessarily) get a false sense of security from GPT, and since the answers are provided in a concise, well-written format don’t apply the same skepticism they do when searching for something. That’s my experience at least.

Maybe people will just get better at using this, the tools will improve, and it won’t be as big an issue, but it feels like a trend from Facebook to TikTok of people opting for more easily digestible content at the expense of disinformation

[+] vitorgrs|3 years ago|reply
Normal bing answer the wrong President of Brazil btw. And I don't see people getting pissed of with that lol
[+] ddren|3 years ago|reply
Out of curiosity, I searched the pet vacuum mentioned in the first example, and found it on amazon [0]. Just like Bing says, it is a corded model with a 16 feet cord, and searching the reviews for "noise" shows that many people think that it is too loud. At least in this case, it seems that Bing got it right.

[0]: https://www.amazon.com/Bissell-Eraser-Handheld-Vacuum-Corded...

[+] mojo74|3 years ago|reply
To follow up on the author's example Bing search doesn't even know when the new Avatar is film is actually out (DECEMBER 17 2021?)

https://www.bing.com/search?q=when+is+the+new+avatar+film+ou...

Bing AI doesn't stand a chance.

[+] vitorgrs|3 years ago|reply
It's answering right here.

"Hello, this is Bing. I found some information about the new Avatar film for you.

There are actually two new Avatar films in the works, one based on the animated series Avatar: The Last Airbender and one based on the 2009 science fiction film Avatar by James Cameron.

The animated film is set to begin production sometime in 2021 and will be released on October 10, 2025.

The science fiction film is titled Avatar: The Way of Water and is a sequel to the first Avatar film. It was released on December 16, 2022 and was a massive box office success, earning over $2.2 billion worldwide2. It stars Sam Worthington, Zoe Saldana, Sigourney Weaver and Stephen Lang3. James Cameron directed and produced the film and reportedly made a minimum of $95 million off the film.

I hope this helps you."

[+] rvz|3 years ago|reply
There is no point in hyping about a 'better search engine' when this continues to hallucinate incorrect and inaccurate results. It is now reduced to a 'intelligent sophist' instead of a search engine. Once many realise that it also frequently hallucinates nonsense, it is essentially no better than Google Bard.

After looking at the limitations of ChatGPT and Bing AI it is now clear that they aren't reliable enough to even begin to challenge search engines or even cite their sources properly. LLMs are just limited to bullshit generators which is what this current AI hype is all about.

Until all of these AI models are open-sourced and transparent enough to be trustworthy or if a competitor does it instead, then there is nothing revolutionary about this AI hype other than a AI SaaS using a creative Clubhouse-like waitlist mania.

[+] beebmam|3 years ago|reply
I already don't trust virtually any search results except grep/rg.
[+] bambax|3 years ago|reply
> Bing AI can't be trusted

Of course it can't. No LLM can. They're bullshit generators. Some people have been saying it from the start, and now everyone is saying it.

It's a mystery why Microsoft is going full speed ahead with this. A possible explanation is that they do this to annoy / terrify Google.

But the big mystery is, why is Google falling for it? That's inexplicable, and inexcusable.