top | item 44365250

(no title)

sgraphics8 | 8 months ago

Hi HN,

I'm building Testr — an AI-powered browser automation/testing tool where you just describe what to test: "check login", "verify newsletter signup", etc. Kind of QA solution for vibe-coders

Under the hood, it’s a real challenge. No wonder AI browser automation is not yet a public good (OpenAI Operator I believe is in reserch preview).

I tried:

- Asking GPT-4o for pixel coordinates → failed

- Treating the screen like a chessboard and asking for coordinates → semi-broken

- Using HTML parsing alone → couldn’t deal with dynamic content or iframes

- Scrolling + hybrid DOM parsing + visibility scoring → finally good enough

For example, modern sites load 10+ iframes (like cookie banners) and bury the important CTAs inside one of them. AI often gets confused by elements not in view. And it refuses to use any coordinate system (pixel coordinates, transparent coordinate overlays).

Eventually I resorted to:

- Injecting custom numbered hit-testable overlays

- Randomizing highlight colors

- Using a hybrid visibility model to bias the AI toward clickable elements

- Keeping scroll position in sync with visible interaction zones

You can try the tool here: https://testr.pro (Note: currently requires signup, but you get 10 free credits to run tests.)

Happy to go deep on technical stuff. I'd love feedback:

- Any ideas to improve interaction targeting?

- Better ways to auto-scroll or discover hidden UI?

- Anyone else trying to make AI click like a human?

Here's a dev thread I wrote with some of the struggles: https://x.com/sgraphics8/status/1935244464559149396

– SG

discuss

order

No comments yet.