I gave it a series of 11 images stripped of all metadata. It performed quite well, only misidentifying the two taken in a small college town in the NE of the US. It got two questions correct on photos taken in Korea (one with a fairly clear view of Haneul Park, the other a rather difficult to identify picture not resembling anything on google of Sunrise Peak).
It got every other question in the US correct, ranging from some under-construction Austin taken from the river to some somewhat difficult shots in NYC (the upper halves of some building from Rockefeller terrace to the black wall of the MOMA).
While not perfect, I'm bluntly shocked at how well it performed
I played a round of Geoguessr against it and while it did a shockingly good job compared to what I was expecting, it still lags behind even novice human players.
The locations and its guesses were:
Bliss, Idaho - Burns, Oregon (273 miles away)
Quilleco, Biobio, Chile - Eugene, Oregon (6,411 miles away)
Dettighofen, Switzerland - Mühldorf, Germany (228 miles away)
Pretoria, South Africa - Johannesburg, South Africa (36 miles away)
Rockhampton, Australia - Gold Coast, Australia (437 miles away)
I gave It some photos from denmark, didn't even bother to strip the metadata. One is correctly said give of "Scandinavian vibes" every other photo was very wrong. I also gave it a photo of the french Alps, it guessed Switzerland.
I gave o4-mini-high a cropped version of a photo I found on Facebook[0][1], and it quickly determined that this was in the UK from the road markings. It also decided that it was from a coastal city because it could see water on the horizon, which is the correct conclusion from incorrect data. There is no water, I think that's trees on a hill. It focused heavily on the spherical structure, which makes sense because it's distinctive, though it had a hard time placing it. It also decided that the building on the left was probably a shopping centre.
It eventually decided that the photo was taken outside the Scottish Exhibition and Conference Centre in Glasgow. It actually generally considered Scottish locations more than others.
The picture was actually taken in Plymouth (so pretty much as far from Scotland as you can get in Britain), on Charles Street looking south-east[2]. The building on the right is Drake Circus, and the one on the left is the Arts University. It actually did consider Plymouth, but decided it didn't match.
It wouldn’t shock me if multimodal LLMs were good at GeoGuesser [0], but if we’re being picky, it takes more than a few examples to demonstrate a game is “solved.” I also wonder what kind of data leakage might have been at play, like other people have suggested.
To be clear, my point is not that this is unimpressive, just that this doesn’t demonstrate much. (Edit: I should have said, it doesn’t demonstrate what the title claims.)
[0] they were very likely trained on a large number of photos that had their location, and they have the ability to isolate features. Combinined with their ability to interpret instructions and just, well, guess, that seems like you have enough for the game.
The examples are cherry-picked. I took a photo outside my office window in a built-up area, o3 thought for 5m 7s (!), and it got the location wrong by 40km. Doesn't look solved to me.
I asked the just-released ChatGPT o4-mini-high to locate four photographs of varying difficulty. It didn’t get any of them right, though the guesses weren’t bad. The reasoning was also interesting to watch, as it cropped sections of the photos to examine them more closely. I put the photos, response, and reasoning trace here:
Later: I tried the same prompt and photos with Gemini 2.5 Pro. It also got them all wrong, though with a similar degree of reasonableness to its guesses. I had thought that Google’s map and street-view data might lead to better results, but not this time.
Still later: I later read that o3 is supposedly particularly good with this geoguessing, so I tried the same prompt and photos with o3. This time it got one out of four correct: “The view of the canal with cherry blossoms and the green railway viaduct is the Ōoka River in Yokohama, looking north from the little road bridge between Hinodechō and Koganechō stations. The tracks on the left belong to the Keikyū Main Line, and the high‑rises in the distance are the Minato‑Mirai and Kita‑Naka district towers.” Its other three answers were still wrong.
There's various degrees of "solved" here. Identifying a generic area is cool. But I wouldn't call it a "solved problem" until it can consistently beat for example Rainbolt in accuracy. And there's no good comparison of completely random roads posted so far - mainly popular locations.
Basically, it's one thing to pick out a specific thing photographed thousands of times, but another to get a random country side view and pick out all the unique features for a very precise guess.
One problem is how can you even set up a "fair" competition between an AI and Rainbolt? He does ones where it flashes for a fraction of a second and then he guesses the country. How do you simulate "only saw it for a fraction of a second" to an AI?
The Alki Beach example is absolute madness. On one hand, I can't wait until all thousands of my photos get automatic semantic and geographic tagging (I guess that's possible now). On the other, goodbye privacy, we hardly knew ye. It will be interesting to apply this to historical, or just old, photos.
I wonder about info leakage with that one, the poster uses that exact photo as their avatar so if they've mentioned Alki Beach before then reverse image searching might pick it up from context. Ideally you'd want to test it with a photo that's never been posted online.
Assuming it only used the pixels and not any metadata from the file or memory from the user (which is a massive assumption), how fucking cool that it can identify the Olympics versus any other mountain range. At that point it’s probably not too hard to guess the picture came from Alki or Golden Gardens, but still very impressive!
I’m also completely ignoring it inferred location from the pride flag and corgi which have heavy Seattle vibes :)
In all these examples, I wonder if it's indirectly able to draw on the user's own location? Not necessarily via image metadata, but the request origin IP etc. If I ask ChatGPT for the weather forecast, I get it for my own location.
Would be interesting to have someone reproduce coming from a different country.
I've been telling women to keep copies of all the dick pics they get sent. Since you can tell by the characteristic noise of a cameras sensor which other pictures were taken with the same camera. All missing is a search engine capable of doing this. I feel with AI, we are 2-3 years away from people uploading a dick pic to AI and getting the social media profile of that person...
Can you share some sources? I would be extremely surprised if such fine grained noise survives imagine compression to the extent that you could identify the source of an imagine despite changing lighting conditions, locations, exposure times, etc
As the article notes, our threat model for who can identify where a picture was posted needs to change from “dedicated, skilled person” to “any creep with $20.”
That’s the point of the switch and it’s a big deal. We’re so used to posting pictures online…I’m just not sure it’s a good idea long-term.
GeoGuessr is also not a “solved problem” in the sense that if you give the model a photo of an outdoor location that is not covered by Google Street View, then it will just make an educated guess which might still be many kilometers away.
A far more apropos comparison would be the internal combustion engine and the horse, in a military context. Though sticking with steam engines, military logistics advantaged over a wagon caravan.
The question here isn't a casual guessing game, but threat models (as directly addressed in TFA), and general informational hygiene.
It is impressive and it almost located the church in my town properly, although in a neighbouring town. However that showed a lack of understanding because its conclusion about the location came from "reading" a signpost that pointed to that other village. Clearly there would be no signpost in a town pointing to itself. Still impressive and lots of correct observations about the subject like architectural details, roman numerals on the watch face etc.
This isn't just referring to GeoGuessr the game, but locating photographs in general. The source post sums it up very reasonably and concisely:
> PSA: When posting any outdoors photos, update your threat model from "someone skilled and dedicated could theoretically locate this" to "any stalker can do this for 20€/mo"
Geoguesser was already a solved problem. I guarantee you ChatGPT is much worse than current systems designed to play Geoguesser (NMPZ).
Chatbots appear to have some amount of fluid intellgence so they can do impressive tasks with this information, the impressiveness of these tasks will likely increase in the future. But for simply getting a good score on Geoguesser it's not even close to hobby projects let alone state of the art.
I have found Google Lens fit for that purpose for years as mountains are fairly distinctive. Using less known landmarks or even random real estate photos, it doesn’t seem to be great.
I just tossed a few GeoGuessr places in it and it was confidently incorrect for all three. In one case it swears it knows the exact building and street. It’s thousands of KM off.
One thing that screws it up is ironically the memory across conversations. I gave it some _really_ obscure photos from some godforsaken ass cracks of the world I've been to, and it was able to guess most of them correctly. However, in its reasoning trace I saw that it's having a heck of a time letting go of the idea that the photo N is not roughly from the same location as the previous photos. What's even more impressive, even when it guesses incorrectly, it can often follow up and guess correctly with minimal hints. And it reasons about it much like a human would, and searches for the same things a human would. Note: I used o3, YMMV with a smaller/weaker model.
After trying with the free ChatGPT at least, I don't think this would be too much of an additional privacy risk. E.g. I submitted a photo I've taken with a feature with a particular architecture style in the front, and a city canal in the background, and told ChatGPT it was taken in China. It separated the feature and canal correctly, but still guessed wrong. I did remove the EXIF information, as ChatGPT tried to extract GPS info and failed at first.
I guess as long as there are no major visible features (e.g. a huge mountain), and you sanitized the metadata, you will be fine (regarding ChatGPT).
I ran a few of these geo guess queries between OpenAI O3 and Gemini Pro 2.5 and O3 does much better on average.
It does spend an order of magnitude longer time on inference by searching through websites and analyzing the image but it often produces an impressive output. To me it also feels Gemini down samples the image as it tends to have a harder time reading small text vs O3.
That said O3 did tend to confidently say false things
[+] [-] barcode_feeder|11 months ago|reply
[+] [-] thomasfromcdnjs|11 months ago|reply
https://chatgpt.com/share/6801bbf7-fd40-8008-985d-75c8813f55...
There is the chat.
Weirdly it said, "I’ve seen that exact house before on Google Street View when exploring Cairns neighborhoods."
[+] [-] CSMastermind|11 months ago|reply
The locations and its guesses were:
Bliss, Idaho - Burns, Oregon (273 miles away)
Quilleco, Biobio, Chile - Eugene, Oregon (6,411 miles away)
Dettighofen, Switzerland - Mühldorf, Germany (228 miles away)
Pretoria, South Africa - Johannesburg, South Africa (36 miles away)
Rockhampton, Australia - Gold Coast, Australia (437 miles away)
[+] [-] delusional|11 months ago|reply
[+] [-] Measter|11 months ago|reply
It eventually decided that the photo was taken outside the Scottish Exhibition and Conference Centre in Glasgow. It actually generally considered Scottish locations more than others.
The picture was actually taken in Plymouth (so pretty much as far from Scotland as you can get in Britain), on Charles Street looking south-east[2]. The building on the right is Drake Circus, and the one on the left is the Arts University. It actually did consider Plymouth, but decided it didn't match.
[0] This image with the "university plymouth" on the left cropped out, just to make it harder: https://www.facebook.com/photo/?fbid=9719044988151697&set=gm...
[1] https://chatgpt.com/share/68024c91-61d0-800c-99b1-fcecf0bfe8...
[2] https://maps.app.goo.gl/3TXv2UxH5128xQjJ9
[+] [-] actuallyalys|11 months ago|reply
To be clear, my point is not that this is unimpressive, just that this doesn’t demonstrate much. (Edit: I should have said, it doesn’t demonstrate what the title claims.)
[0] they were very likely trained on a large number of photos that had their location, and they have the ability to isolate features. Combinined with their ability to interpret instructions and just, well, guess, that seems like you have enough for the game.
[+] [-] Kolya|11 months ago|reply
[+] [-] echelon|11 months ago|reply
Here are a few results from GPT 4.5:
https://imgur.com/a/lGTipnn
[+] [-] xzjis|11 months ago|reply
[+] [-] cwmoore|11 months ago|reply
[+] [-] tkgally|11 months ago|reply
https://www.gally.net/temp/20250418chatgptgeoguesser/index.h...
Later: I tried the same prompt and photos with Gemini 2.5 Pro. It also got them all wrong, though with a similar degree of reasonableness to its guesses. I had thought that Google’s map and street-view data might lead to better results, but not this time.
[+] [-] tkgally|11 months ago|reply
[+] [-] viraptor|11 months ago|reply
Basically, it's one thing to pick out a specific thing photographed thousands of times, but another to get a random country side view and pick out all the unique features for a very precise guess.
[+] [-] Benjammer|11 months ago|reply
[+] [-] GaggiX|11 months ago|reply
[+] [-] jampa|11 months ago|reply
https://i.redd.it/dz8bhnamohb71.jpg
[+] [-] romanhn|11 months ago|reply
[+] [-] jsheard|11 months ago|reply
I wonder about info leakage with that one, the poster uses that exact photo as their avatar so if they've mentioned Alki Beach before then reverse image searching might pick it up from context. Ideally you'd want to test it with a photo that's never been posted online.
[+] [-] beoberha|11 months ago|reply
I’m also completely ignoring it inferred location from the pride flag and corgi which have heavy Seattle vibes :)
[+] [-] weregiraffe|11 months ago|reply
Don't upload your private photos anywhere.
[+] [-] Retr0id|11 months ago|reply
Would be interesting to have someone reproduce coming from a different country.
[+] [-] dataviz1000|11 months ago|reply
[+] [-] littlecranky67|11 months ago|reply
[+] [-] Reubend|11 months ago|reply
[+] [-] bravetraveler|11 months ago|reply
[+] [-] chneu|11 months ago|reply
C'mon boys. Start uploading those dick pics for research purposes.
[+] [-] iambateman|11 months ago|reply
That’s the point of the switch and it’s a big deal. We’re so used to posting pictures online…I’m just not sure it’s a good idea long-term.
[+] [-] xeonmc|11 months ago|reply
[+] [-] layman51|11 months ago|reply
[+] [-] dredmorbius|11 months ago|reply
The question here isn't a casual guessing game, but threat models (as directly addressed in TFA), and general informational hygiene.
[+] [-] paxys|11 months ago|reply
[+] [-] colordrops|11 months ago|reply
[+] [-] reaperman|11 months ago|reply
[+] [-] ofrzeta|11 months ago|reply
[+] [-] abcanddbutnote|11 months ago|reply
[+] [-] Crestwave|11 months ago|reply
> PSA: When posting any outdoors photos, update your threat model from "someone skilled and dedicated could theoretically locate this" to "any stalker can do this for 20€/mo"
[+] [-] casey2|11 months ago|reply
Chatbots appear to have some amount of fluid intellgence so they can do impressive tasks with this information, the impressiveness of these tasks will likely increase in the future. But for simply getting a good score on Geoguesser it's not even close to hobby projects let alone state of the art.
[+] [-] ggnore7452|11 months ago|reply
[+] [-] MattGaiser|11 months ago|reply
I just tossed a few GeoGuessr places in it and it was confidently incorrect for all three. In one case it swears it knows the exact building and street. It’s thousands of KM off.
[+] [-] unknown|11 months ago|reply
[deleted]
[+] [-] ein0p|11 months ago|reply
[+] [-] Gathering6678|11 months ago|reply
I guess as long as there are no major visible features (e.g. a huge mountain), and you sanitized the metadata, you will be fine (regarding ChatGPT).
[+] [-] jusgu|11 months ago|reply
[+] [-] gerash|11 months ago|reply
It does spend an order of magnitude longer time on inference by searching through websites and analyzing the image but it often produces an impressive output. To me it also feels Gemini down samples the image as it tends to have a harder time reading small text vs O3.
That said O3 did tend to confidently say false things
[+] [-] usaar333|11 months ago|reply
[+] [-] fullshark|11 months ago|reply