I've tried to repaint the exterior of my house. More than 20 times with very detailed prompts. I even tried to optimize it with Claude. No matter what, every time it added one, two or three extra windows to the same wall.
Here, it mostly poisons your test, because that exact photo probably exists in the underlying training data and the trained network will be more or less optimized on working with it. It's really the same consideration you'd want to make when testing classifiers or other ML techs 10 years ago.
Most people taking to a task like this will be using an original photo -- missing entirely from any training date, poorly framed, unevenly lit, etc -- and you need to be careful to capture as much of that as possible when trying to evaluate how a model will work in that kind of use case.
The failure and stress points for AI tools are generally kind of alien and unfamiliar because the way they operate is totally different than the way a human operates, and if you're not especially attentive to their weird failure shapes and biases when you want to test them, or you'll easily get false positives (and false negatives) that lead you to misleading conclusions.
I also tried that in the past with poor results. I just tried it this morning with nano banana pro and it nailed it with a very short prompt: "Repaint the house white with black trim. Do not paint over brick."
I don't know what it is with Gemini (and even other models) but I swear they must be doing some kind of active load-dependant quanitization or a/b/c/d testing behind the scenes, because sometimes the model is stellar and hitting everything, and other times it's tripping all over itself.
The most effective fix I have found is that when the model is acting dumb, just turn it off and come back in the few hours to a new chat and try again.
cj|3 months ago
Results: https://imgur.com/a/9II0Aip
The white house was the original (random photo from Google). The prompt was "What paint color would look nice? Paint the house."
swatcoder|3 months ago
Careful with that kind of thing.
Here, it mostly poisons your test, because that exact photo probably exists in the underlying training data and the trained network will be more or less optimized on working with it. It's really the same consideration you'd want to make when testing classifiers or other ML techs 10 years ago.
Most people taking to a task like this will be using an original photo -- missing entirely from any training date, poorly framed, unevenly lit, etc -- and you need to be careful to capture as much of that as possible when trying to evaluate how a model will work in that kind of use case.
The failure and stress points for AI tools are generally kind of alien and unfamiliar because the way they operate is totally different than the way a human operates, and if you're not especially attentive to their weird failure shapes and biases when you want to test them, or you'll easily get false positives (and false negatives) that lead you to misleading conclusions.
ceejayoz|3 months ago
At some point, this is probably gonna result in you coming home to a painted house and a big bill, lol.
vunderba|3 months ago
fumeux_fume|3 months ago
Workaccount2|3 months ago
The most effective fix I have found is that when the model is acting dumb, just turn it off and come back in the few hours to a new chat and try again.
jamil7|3 months ago
grantpitt|3 months ago
evrenesat|3 months ago
unknown|3 months ago
[deleted]
Nemi|3 months ago
dyauspitr|3 months ago
seanweng|3 months ago