top | item 47115972

(no title)

YaraDori | 8 days ago

This is wild — using iPhone Mirroring as the “device API” is a clever hack.

Two reliability questions: 1) How do you handle UI drift / A-B layouts / dynamic text (e.g., different font sizes, localization)? Do you anchor on vision+OCR only, or do you also incorporate accessibility tree / UIAutomation hints where possible? 2) Do you have a concept of checkpoints + recovery (e.g., if a tap misfires, can the agent detect it’s in the wrong screen and roll back / retry)?

I’m working on SkillForge (https://skillforge.expert) — we’re doing something adjacent for web UIs: record a workflow once, then replay it later as a reusable skill with checkpoints + retries. Curious if you’ve considered exporting a “skill file” for iPhone flows (step evidence + approved boundaries), so people can share/reuse automations safely.

discuss

No comments yet.