top | item 47139469

(no title)

bigbuppo | 5 days ago

Well, that means the AI is garbage. They'll eventually train it to answer this specific question, and then it will perform worse in some other aspect. Wash, rinse, repeat, and eventually they'll claim the new frontier model is the best yet on carwash tests.

discuss

keeda|5 days ago

> They'll eventually train it to answer this specific question, and then it will perform worse in some other aspect.

Not necessarily. Simply asking models to "check your assumptions" -- note, without specifying what assumptions! -- overcomes a lot of these gotcha questions. The reason it's not in their system prompts by default is I think just a cost optimization: https://news.ycombinator.com/item?id=47040530

BobbyJo|5 days ago

Crazy how five years ago this level of AI would be seen as scifi, and now there are people out there who think it's trash because we can trick it if we ask questions in weird ways.

davorak|5 days ago

I think the level of ai we have is amazing.

> there are people out there who think it's trash because we can trick it if we ask questions in weird ways.

Some of this sentiment comes form wanting AI to be predictable and for me stumbling into questions that the current models interpret oddly is not uncommon. There are a bunch of rules of thumbs that can be used to help when you run into a cases like this but no guarantee that they will work, or that the problem will remain solved after a model update, or across models.

bigbuppo|5 days ago

When did Microsoft release that chat bot that went full nazi in a couple of hours?

steveBK123|5 days ago

An issue in the chat format is that all these models seem bad at recognizing when they have extraneous information from user that can be ignored, or insufficient information from the user to answer the question fully.

This issue is compounded by the lack of probabilities in the answers, despite the machines ultimately being probabilistic.

Notice a human in a real conversation will politely ignore extra info (the distance to car wash) or ask clarifying questions (where is the car?).

Even non-STEM people answer using probabilistic terms casually (almost certainly / most likely / probably / possibly / unlikely).

I suspect some of this is to minimize token usage in the fixed monthly price chat models, because back&forth would cost more tokens.. but maybe I'm too cynical.

bigbuppo|5 days ago

The systems recognized the pattern that it looks like a generic article on the internet asking whether someone should walk or drive and answered it exactly as expected based on their training data. None of this should be surprising.

We are the ones fooling ourselves into believing there's more intelligence in these systems than they really have. At the end of the day, it's just an impressive parlor trick.