(no title)
waynenilsen | 23 days ago
The amount of manual QA I am currently subjected to is simultaneously infuriating and hilarious. The foundation models are up to the task but we need new abstractions and layers to correctly fix it. This will all go the way of the dodo in 12 months but it'll be useful in the meantime.
agent-browser helped a lot over playwright but doesn't completely close the gap.
antves|23 days ago
I think this paradigm was very visible in yesterday's blog post from Anthropic (https://www.anthropic.com/engineering/building-c-compiler) when they mentioned that giving the agents the ability to verify against GCC was the key to unlock further progress
Giving a browser to these agents is a no brainer, especially if one works in QA or develops web-based services
tiny-automates|23 days ago
the right abstraction for QA is probably closer to what a manual tester actually does, describe expected behavior, let a specialized system figure out the mechanical verification steps.
but the harder unsolved problem is evaluation: how do you reliably distinguish "the agent verified the behavior" from "the agent navigated to the right page and hallucinated a success report"? visual diffing against golden screenshots helps for regression but doesn't cover semantic correctness of dynamic content.