I agree claude and chatgpt and even gemini does a poor job in detecting and cropping into a region. Some of the simplest tasks, Qwen also is great at summerization but not into solving simple vision tasks like cropping, segmentetation and detection. Here is an examples where we compared claude, gemini, chatgpt and other frontier models for simple(and complicated) visual tasks
https://chat.vlm.run/showdown#:~:text=Crop%20into%20the%20cl...
colechristensen|2 months ago