top | item 42966820

(no title)

iiJDSii | 1 year ago

Such as? Are they able to recognize arbitrary GUI elements from various desktop programs, web browsers, etc?

discuss

order

mountainriver|1 year ago

Qwen2.5-vl seems to be the best right now by our tests.

UI-TARS by bytedance also has a good amount of pretraining.

Molmo is also very good at coordinates.