top | item 41938684

(no title)

Counterpoint: no, it's just more hype.

Doing real-time OCR on 1280x1024 bitmaps has been possible for... the last decade or so? Sure, you can now do it on 4K or 8K bitmaps, but that's just an incremental improvement.

Fact is, full-screen OCR coupled with innovations like "Google" has not lead to "ultimate" productivity improvements, and as impressive as OpenAI et al may appear right now, the impact of these technologies will end up roughly similar.

(Which is to say: the landscape will change, but not in a truly fundamental way. What you're seeing demonstrated right now is, roughly speaking, the next Clippy, which, believe it or not, was hyped to a similar extent around the time it was introduced...)

discuss

simonw|1 year ago

The way these new LLM vision models work is very different from OCR.

I saw a demo this morning of someone getting Claude to play FreeCiv (admittedly extremely badly): https://twitter.com/greggyb/status/1849198544445432229

Try doing that with Tesseract.

croes|1 year ago

I bet Tesseract plays pretty badly too.

KoolKat23|1 year ago

Existing OCR is extremely limited and requires custom narrow development.

acchow|1 year ago

"OCR : Computer Use" is as "voice-to-text : ChatGPT Voice"