top | item 39458704

(no title)

ankeshanand | 2 years ago

We've done extensive comparisons against GPT-4V for video inputs in our technical report: https://storage.googleapis.com/deepmind-media/gemini/gemini_....

Most notably, at 1FPS the GPT-4V API errors out around 3-4 mins, while 1.5 Pro supports upto an hour of video inputs.

discuss

order

jxy|2 years ago

So that 3-4 mins at 1FPS means you are using about 500 to 700 tokens per image, which means you are using `detail: high` with something like 1080p to feed to gpt-4-vision-preview (unless you have another private endpoint).

The gemini 1.5 pro uses about 258 tokens per frame (2.8M tokens for 10856 frames).

Are those comparable?

moralestapia|2 years ago

>while 1.5 Pro supports upto an hour of video inputs

At what price, tho?

verticalscaler|2 years ago

The average shot length in modern movies is between 4 and 16 seconds and around 1 minute for a scene.