top | item 47076733

(no title)

1024core | 10 days ago

It got the car wash question perfectly:

You are definitely going to have to drive it there—unless you want to put it in neutral and push!

While 200 feet is a very short and easy walk, if you walk over there without your car, you won't have anything to wash once you arrive. The car needs to make the trip with you so it can get the soap and water.

Since it's basically right next door, it'll be the shortest drive of your life. Start it up, roll on over, and get it sparkling clean.

Would you like me to check the local weather forecast to make sure it's not going to rain right after you wash it?

discuss

1024core|10 days ago

Some people are suggesting that this might actually be in the training set. Since I can't rule that out, I tried a different version of the question, with an elephant instead of a car:

> It's a hot and dusty day in Arizona and I need to wash my elephant. There's a creek 300 feet away. Should I ride my elephant there or should I just walk there by myself?

Gemini said:

That sounds like quite the dusty predicament! Given the options, you should absolutely ride (or lead) your elephant to the creek.

Here is the straightforward reality of the situation:

    The Logistical Nightmare of Walking Alone: If you walk to the creek by yourself, your elephant stays at your starting point. Elephants require a massive amount of water for a bath. Unless you are planning an extreme workout carrying hundreds of buckets of water 300 feet back through the Arizona heat, you're going to have a very bad time—and the elephant will stay dusty.

    The Perks of Going Together: 300 feet is a very short, easy stroll for an elephant. Plus, elephants generally love water and mud! Letting your elephant get right into the creek will make the washing process much easier for you and much more enjoyable for them. They can even use their trunk to help hose themselves down.

Grab some elephant-safe soap and head down to the water together!

Would you like some tips on the best methods and tools for scrubbing down an elephant?

kovek|10 days ago

I think that semantically this question is too similar to the car wash one. Changing subjects from car to elephant and car wash to creek does not change the fact that they are subjects. The embeddings will be similar in that dimension.

proto-n|10 days ago

From Gemini pro:

You should definitely ride the elephant (or at least lead it there)!

Here is the logic:

If you walk there by yourself, you will arrive at the creek, but the dirty elephant will still be 300 feet back where you started. You can't wash the elephant if it isn't with you!

Plus, it is much easier to take the elephant to the water than it is to carry enough buckets of water 300 feet back to the elephant.

Would you like another riddle, or perhaps some actual tips on how to keep cool in the Arizona heat?

irthomasthomas|10 days ago

i would say this is a lower difficulty. the car question primes it to think about stuff like energy and pollution.

jmkni|10 days ago

I like telling it the car wash is 24,901.1 miles away (with the implication being that I'm right beside the car wash)

embedding-shape|10 days ago

GPT-OSS-120b, a (downloadable) model released more than half a year ago also gets that right, I'm not sure this is such a great success.

> Would you like me to check the local weather forecast to make sure it's not going to rain right after you wash it?

Regardless of what I do, the days I decide to wash my car, it ALWAYS rains the day after, even if I look at the weather report and it says sunny. Cute that Gemini thinks it can predict it better than universe's will.

sushisource|10 days ago

> Regardless of what I do, the days I decide to wash my car, it ALWAYS rains the day after

Undeniable universal truth. I sometimes find myself making plans based on the fact that the most annoying possible outcome is also the most likely one.

Aachen|10 days ago

Non car person here. Why does that matter? It's not like rain means you didn't have to go to the wash, it rains often enough here that there wouldn't be car wash places left near me but there are plenty

steve1977|10 days ago

The question of course is, did it get the car wash question right because it is "the car wash question" or because it could actually infer why the car needed to be there?

embedding-shape|10 days ago

Wasn't that "twoot" (or whatever Mastodon calls them) made just a week ago? Unlikely to have been in the training dataset of a model becoming available for public use today, unless Google made some serious advancements on the training front.

jama211|10 days ago

Shouldn’t be too hard to come up with a new unique reasoning question

upmind|10 days ago

The answer here is why I dislike Gemini, though it gets the correct answer, it's far too verbose.

KoolKat23|10 days ago

I can't stand a model over-explaining, needless fluff and wasting tokens. I asked the question so I know the context.

mikelward|10 days ago

I don't love the verbosity of any of the chatbots when I'm using my phone, but at least it put the answer/tl;dr in the first paragraph.

leumon|10 days ago

Gemini 3 pro and flash already answered this correctly.

claiir|10 days ago

And Gemini 3 can’t..? Isn’t this just a thinking vs nonthinking model thing?

patrickmcnamara|10 days ago

Truly we entering the era of AGI.

suddenlybananas|10 days ago

They probably had time to toss that example in the training soup.

AlphaAndOmega0|10 days ago

Previous models from competitors usually got that correct, and the reasoning versions almost always did.

This kind of reflexive criticism isn't helpful, it's closer to a fully generalized counter-argument against LLM progress, whereas it's obvious to anyone that models today can do things they couldn't do six months ago, let alone 2 years back.

buttered_toast|10 days ago

I think we need to reevaluate what purpose these sorts of questions serve and why they're important in regards to judging intelligence.

The model getting it correct or not at any given instance isn't the point, the point is if the model ever gets it wrong we can still assume that it still has some semblance of stochasticity in its output, given that a model is essentially static once it is released.

Additionally, hey don't learn post training (except for in context which I think counts as learning to some degree albeit transient), if hypothetically it answers incorrectly 1 in 50 attempts, and I explain in that 1 failed attempt why it is wrong, it will still be a 1-50 chance it gets it wrong in a new instance.

This differs from humans, say for example I give an average person the "what do you put in a toaster" trick and they fall for it, I can be pretty confident that if I try that trick again 10 years later they will probably not fall for it, you can't really say that for a given model.

energy123|10 days ago

They're important but not as N=1. It's like cherry picking a single question from SimpleQA and going aha! It got it right! Meanwhile it's 8% lower score than some other model when evaluated on all questions.