top | item 46792828

(no title)

Playwright + something like Claude with computer use is probably closest to what you're describing. Though I'd push back a bit on the approach.

The flaky test problem usually comes from either race conditions (waiting for wrong things) or environmental differences. Adding AI vision on top often adds another layer of flakiness - now you're debugging "why did the model misread this button" on top of "why did the test timeout."

For mocking external services specifically - tools like MailHog (email) or mock OAuth providers tend to be more reliable than screenshot-based approaches. The determinism matters.

That said, if you genuinely need to test against production-like visual state - Playwright's screenshot comparison (toHaveScreenshot) combined with proper wait strategies has gotten pretty solid. The visual regression approach catches layout bugs that functional tests miss.

discuss

No comments yet.