top | item 44300119

(no title)

tjsk | 8 months ago

Did you consider working around those using the vision models vs DOM parsing? Was cost/latency the bottleneck? Seems like the agentic future you describe would need more vision based parsing

discuss

arcb|8 months ago

I believe we will at some point. All question of the right need coming up. Text OCR has gotten really good, and if you think of it from a UI perspective, the only real contract is that a screen will show text that's representative of the information entered. The DOM is useful but is a changeable contract!