top | item 44595866

(no title)

gordon_freeman | 7 months ago

This.

Recently I uploaded screenshot of movie show timing at a specific theatre and asked ChatGPT to find the optimal time for me to watch the movie based on my schedule.

It did confidently find the perfect time and even accounted for the factors such as movies in theatre start 20 mins late due to trailers and ads being shown before movie starts. The only problem: it grabbed the times from the screenshot totally incorrectly which messed up all its output and I tried and tried to get it to extract the time accurately but it didn’t and ultimately after getting frustrated I lost the trust in its ability. This keeps happening again and again with LLMs.

discuss

order

barbazoo|7 months ago

And this is actually a great use of Agents because they can go and use the movie theater's website to more reliably figure out when movies start. I don't think they're going to feed screenshots in to the LLM.

tootyskooty|7 months ago

Honestly might be more indicative of how far behind vision is than anything.

Despite the fact that CV was the first real deep learning breakthrough VLMs have been really disappointing. I'm guessing it's in part due to basic interleaved web text+image next token prediction being a weak signal to develop good image reasoning.

polytely|7 months ago

Is there anyone trying to solve OCR, I often think of that annas-archive blog about how we basically just have to keep shadow libraries alive long enough until the conversion from pdf to plaintext is solved.

https://annas-archive.org/blog/critical-window.html

I hope one of these days one of these incredibly rich LLM companies accidentally solves this or something, would be infinitely more beneficial to mankind than the awful LLM products they are trying to make