top | item 43723408

ChatGPT now performs well at GeoGuesser

171 points| dredmorbius | 11 months ago |flausch.social | reply

148 comments

order
[+] barcode_feeder|11 months ago|reply
I gave it a series of 11 images stripped of all metadata. It performed quite well, only misidentifying the two taken in a small college town in the NE of the US. It got two questions correct on photos taken in Korea (one with a fairly clear view of Haneul Park, the other a rather difficult to identify picture not resembling anything on google of Sunrise Peak). It got every other question in the US correct, ranging from some under-construction Austin taken from the river to some somewhat difficult shots in NYC (the upper halves of some building from Rockefeller terrace to the black wall of the MOMA). While not perfect, I'm bluntly shocked at how well it performed
[+] CSMastermind|11 months ago|reply
I played a round of Geoguessr against it and while it did a shockingly good job compared to what I was expecting, it still lags behind even novice human players.

The locations and its guesses were:

Bliss, Idaho - Burns, Oregon (273 miles away)

Quilleco, Biobio, Chile - Eugene, Oregon (6,411 miles away)

Dettighofen, Switzerland - Mühldorf, Germany (228 miles away)

Pretoria, South Africa - Johannesburg, South Africa (36 miles away)

Rockhampton, Australia - Gold Coast, Australia (437 miles away)

[+] delusional|11 months ago|reply
I gave It some photos from denmark, didn't even bother to strip the metadata. One is correctly said give of "Scandinavian vibes" every other photo was very wrong. I also gave it a photo of the french Alps, it guessed Switzerland.
[+] Measter|11 months ago|reply
I gave o4-mini-high a cropped version of a photo I found on Facebook[0][1], and it quickly determined that this was in the UK from the road markings. It also decided that it was from a coastal city because it could see water on the horizon, which is the correct conclusion from incorrect data. There is no water, I think that's trees on a hill. It focused heavily on the spherical structure, which makes sense because it's distinctive, though it had a hard time placing it. It also decided that the building on the left was probably a shopping centre.

It eventually decided that the photo was taken outside the Scottish Exhibition and Conference Centre in Glasgow. It actually generally considered Scottish locations more than others.

The picture was actually taken in Plymouth (so pretty much as far from Scotland as you can get in Britain), on Charles Street looking south-east[2]. The building on the right is Drake Circus, and the one on the left is the Arts University. It actually did consider Plymouth, but decided it didn't match.

[0] This image with the "university plymouth" on the left cropped out, just to make it harder: https://www.facebook.com/photo/?fbid=9719044988151697&set=gm...

[1] https://chatgpt.com/share/68024c91-61d0-800c-99b1-fcecf0bfe8...

[2] https://maps.app.goo.gl/3TXv2UxH5128xQjJ9

[+] actuallyalys|11 months ago|reply
It wouldn’t shock me if multimodal LLMs were good at GeoGuesser [0], but if we’re being picky, it takes more than a few examples to demonstrate a game is “solved.” I also wonder what kind of data leakage might have been at play, like other people have suggested.

To be clear, my point is not that this is unimpressive, just that this doesn’t demonstrate much. (Edit: I should have said, it doesn’t demonstrate what the title claims.)

[0] they were very likely trained on a large number of photos that had their location, and they have the ability to isolate features. Combinined with their ability to interpret instructions and just, well, guess, that seems like you have enough for the game.

[+] Kolya|11 months ago|reply
The examples are cherry-picked. I took a photo outside my office window in a built-up area, o3 thought for 5m 7s (!), and it got the location wrong by 40km. Doesn't look solved to me.
[+] echelon|11 months ago|reply
I was thinking it was using IP geolocation, but after experimenting, I think it's just generally informed.

Here are a few results from GPT 4.5:

https://imgur.com/a/lGTipnn

[+] xzjis|11 months ago|reply
I uploaded a picture I took, I don't save GPS coordinates on my pictures, but the first thing ChatGPT did is to read exif data from it.
[+] cwmoore|11 months ago|reply
New title is pretty accurate: "now performs well". Another amenable HN solution.
[+] tkgally|11 months ago|reply
I asked the just-released ChatGPT o4-mini-high to locate four photographs of varying difficulty. It didn’t get any of them right, though the guesses weren’t bad. The reasoning was also interesting to watch, as it cropped sections of the photos to examine them more closely. I put the photos, response, and reasoning trace here:

https://www.gally.net/temp/20250418chatgptgeoguesser/index.h...

Later: I tried the same prompt and photos with Gemini 2.5 Pro. It also got them all wrong, though with a similar degree of reasonableness to its guesses. I had thought that Google’s map and street-view data might lead to better results, but not this time.

[+] tkgally|11 months ago|reply
Still later: I later read that o3 is supposedly particularly good with this geoguessing, so I tried the same prompt and photos with o3. This time it got one out of four correct: “The view of the canal with cherry blossoms and the green railway viaduct is the Ōoka River in Yokohama, looking north from the little road bridge between Hinodechō and Koganechō stations. The tracks on the left belong to the Keikyū Main Line, and the high‑rises in the distance are the Minato‑Mirai and Kita‑Naka district towers.” Its other three answers were still wrong.
[+] viraptor|11 months ago|reply
There's various degrees of "solved" here. Identifying a generic area is cool. But I wouldn't call it a "solved problem" until it can consistently beat for example Rainbolt in accuracy. And there's no good comparison of completely random roads posted so far - mainly popular locations.

Basically, it's one thing to pick out a specific thing photographed thousands of times, but another to get a random country side view and pick out all the unique features for a very precise guess.

[+] Benjammer|11 months ago|reply
One problem is how can you even set up a "fair" competition between an AI and Rainbolt? He does ones where it flashes for a fraction of a second and then he guesses the country. How do you simulate "only saw it for a fraction of a second" to an AI?
[+] romanhn|11 months ago|reply
The Alki Beach example is absolute madness. On one hand, I can't wait until all thousands of my photos get automatic semantic and geographic tagging (I guess that's possible now). On the other, goodbye privacy, we hardly knew ye. It will be interesting to apply this to historical, or just old, photos.
[+] jsheard|11 months ago|reply
> The Alki Beach example is absolute madness.

I wonder about info leakage with that one, the poster uses that exact photo as their avatar so if they've mentioned Alki Beach before then reverse image searching might pick it up from context. Ideally you'd want to test it with a photo that's never been posted online.

[+] beoberha|11 months ago|reply
Assuming it only used the pixels and not any metadata from the file or memory from the user (which is a massive assumption), how fucking cool that it can identify the Olympics versus any other mountain range. At that point it’s probably not too hard to guess the picture came from Alki or Golden Gardens, but still very impressive!

I’m also completely ignoring it inferred location from the pride flag and corgi which have heavy Seattle vibes :)

[+] weregiraffe|11 months ago|reply
>goodbye privacy, we hardly knew ye.

Don't upload your private photos anywhere.

[+] Retr0id|11 months ago|reply
In all these examples, I wonder if it's indirectly able to draw on the user's own location? Not necessarily via image metadata, but the request origin IP etc. If I ask ChatGPT for the weather forecast, I get it for my own location.

Would be interesting to have someone reproduce coming from a different country.

[+] dataviz1000|11 months ago|reply
I'm in Lima, Peru on vacation. Yes, it knows where I am.
[+] littlecranky67|11 months ago|reply
I've been telling women to keep copies of all the dick pics they get sent. Since you can tell by the characteristic noise of a cameras sensor which other pictures were taken with the same camera. All missing is a search engine capable of doing this. I feel with AI, we are 2-3 years away from people uploading a dick pic to AI and getting the social media profile of that person...
[+] Reubend|11 months ago|reply
Can you share some sources? I would be extremely surprised if such fine grained noise survives imagine compression to the extent that you could identify the source of an imagine despite changing lighting conditions, locations, exposure times, etc
[+] bravetraveler|11 months ago|reply
Good advice, though purpose-scoped devices are so common we have songs
[+] chneu|11 months ago|reply
This is just a data problem. The more dick pics we can feed into it then the better the results will be.

C'mon boys. Start uploading those dick pics for research purposes.

[+] iambateman|11 months ago|reply
As the article notes, our threat model for who can identify where a picture was posted needs to change from “dedicated, skilled person” to “any creep with $20.”

That’s the point of the switch and it’s a big deal. We’re so used to posting pictures online…I’m just not sure it’s a good idea long-term.

[+] xeonmc|11 months ago|reply
The steam engine was invented and the 100m dash is now a solved problem.
[+] layman51|11 months ago|reply
GeoGuessr is also not a “solved problem” in the sense that if you give the model a photo of an outdoor location that is not covered by Google Street View, then it will just make an educated guess which might still be many kilometers away.
[+] dredmorbius|11 months ago|reply
A far more apropos comparison would be the internal combustion engine and the horse, in a military context. Though sticking with steam engines, military logistics advantaged over a wagon caravan.

The question here isn't a casual guessing game, but threat models (as directly addressed in TFA), and general informational hygiene.

[+] paxys|11 months ago|reply
Just like how chess engines ended competitive chess as people were predicting at the time.
[+] colordrops|11 months ago|reply
"The question of whether a computer can think is no more interesting than the question of whether a submarine can swim." - Edsger Dijkstra
[+] reaperman|11 months ago|reply
Normally I dislike these quips for HN; I hate that I love this one.
[+] ofrzeta|11 months ago|reply
It is impressive and it almost located the church in my town properly, although in a neighbouring town. However that showed a lack of understanding because its conclusion about the location came from "reading" a signpost that pointed to that other village. Clearly there would be no signpost in a town pointing to itself. Still impressive and lots of correct observations about the subject like architectural details, roman numerals on the watch face etc.
[+] abcanddbutnote|11 months ago|reply
One line of javascript solved that "problem" a while ago. The answer is in the DOM.
[+] Crestwave|11 months ago|reply
This isn't just referring to GeoGuessr the game, but locating photographs in general. The source post sums it up very reasonably and concisely:

> PSA: When posting any outdoors photos, update your threat model from "someone skilled and dedicated could theoretically locate this" to "any stalker can do this for 20€/mo"

[+] casey2|11 months ago|reply
Geoguesser was already a solved problem. I guarantee you ChatGPT is much worse than current systems designed to play Geoguesser (NMPZ).

Chatbots appear to have some amount of fluid intellgence so they can do impressive tasks with this information, the impressiveness of these tasks will likely increase in the future. But for simply getting a good score on Geoguesser it's not even close to hobby projects let alone state of the art.

[+] ggnore7452|11 months ago|reply
I’ve been using LLMs for this kind of geo-guessing since Gemini 2.0. Even without access to internet search like o3, they perform surprisingly well.
[+] MattGaiser|11 months ago|reply
I have found Google Lens fit for that purpose for years as mountains are fairly distinctive. Using less known landmarks or even random real estate photos, it doesn’t seem to be great.

I just tossed a few GeoGuessr places in it and it was confidently incorrect for all three. In one case it swears it knows the exact building and street. It’s thousands of KM off.

[+] ein0p|11 months ago|reply
One thing that screws it up is ironically the memory across conversations. I gave it some _really_ obscure photos from some godforsaken ass cracks of the world I've been to, and it was able to guess most of them correctly. However, in its reasoning trace I saw that it's having a heck of a time letting go of the idea that the photo N is not roughly from the same location as the previous photos. What's even more impressive, even when it guesses incorrectly, it can often follow up and guess correctly with minimal hints. And it reasons about it much like a human would, and searches for the same things a human would. Note: I used o3, YMMV with a smaller/weaker model.
[+] Gathering6678|11 months ago|reply
After trying with the free ChatGPT at least, I don't think this would be too much of an additional privacy risk. E.g. I submitted a photo I've taken with a feature with a particular architecture style in the front, and a city canal in the background, and told ChatGPT it was taken in China. It separated the feature and canal correctly, but still guessed wrong. I did remove the EXIF information, as ChatGPT tried to extract GPS info and failed at first.

I guess as long as there are no major visible features (e.g. a huge mountain), and you sanitized the metadata, you will be fine (regarding ChatGPT).

[+] jusgu|11 months ago|reply
Have you tried this with o3? I think o3 is much better at this than any of the free models
[+] gerash|11 months ago|reply
I ran a few of these geo guess queries between OpenAI O3 and Gemini Pro 2.5 and O3 does much better on average.

It does spend an order of magnitude longer time on inference by searching through websites and analyzing the image but it often produces an impressive output. To me it also feels Gemini down samples the image as it tends to have a harder time reading small text vs O3.

That said O3 did tend to confidently say false things

[+] usaar333|11 months ago|reply
Huh? It's not even that high on the leaderboard: https://geobench.org/
[+] fullshark|11 months ago|reply
Using actual geoguesser data means using google street view data, so gemini being on top isn't too surprising.