(no title)
the-grump | 1 month ago
Consider also that they can generate summaries and tackle the novel piecemeal, just like a human would.
Re: movies. Get YouTube premium and ask YouTube to summarize a 2hr video for you.
the-grump | 1 month ago
Consider also that they can generate summaries and tackle the novel piecemeal, just like a human would.
Re: movies. Get YouTube premium and ask YouTube to summarize a 2hr video for you.
falloutx|1 month ago
> Re: movies. Get YouTube premium and ask YouTube to summarize a 2hr video for you.
This is different from watching a movie. Can it tell what suit actor was wearing? Can it tell what the actor's face looked like? Summarising and watching are too different things.
pigpop|1 month ago
https://github.com/JUNJIE99/MLVU
https://huggingface.co/datasets/OpenGVLab/MVBench
Ovis and Qwen3-VL are examples of models that can work with multiple frames from a video at once to produce both visual and temporal understanding
https://huggingface.co/AIDC-AI/Ovis2.5-9B
https://github.com/QwenLM/Qwen3-VL
cmcaleer|1 month ago
Which is a relatively trivial task for a current LLM.