top | item 47154591

(no title)

1 points| jzapletal | 5 days ago

discuss

I did this while trying to figure out what to use in our own tool. The task was to analyze around 12,000 screenshots and find recurring manual workflows worth automating.

Results:

- Claude Sonnet 4.6: 8/10, $0.53/run — wins on quality

- Kimi K2.5: 7/10, $0.09/run — 6x cheaper, now my production pick

- GPT-5.2: 6/10, $0.41/run — missed the most obvious patterns, odd

- DeepSeek V3.2: 0/10 — gave me a garbled XML...

Models that flagged a one-time DKIM setup as "recurring automation candidate" got penalized.

Happy to share more if folks find this interesting.