(no title)
dfbrown | 2 years ago
In this post, we’ll explore some of the prompting approaches we used in our Hands on with Gemini demo video.
which makes it sound like they used text + image prompts and then acted them out in the video, as opposed to Gemini interpreting the video directly.
https://developers.googleblog.com/2023/12/how-its-made-gemin...
riscy|2 years ago
> Narrator: "Based on their design, which of these would go faster?"
Without even specifying that those are cars! That was impressive to me, that it recognized the cars are going downhill _and_ could infer that in such a situation, aerodynamics matters. But the blog post says the real prompt was this:
> Real Prompt: "Which of these cars is more aerodynamic? The one on the left or the right? Explain why, using specific visual details."
They narrated inaccurate prompts for the Sun/Saturn/Earth example too:
> Narrator: "Is this the right order?"
> Real Prompt: "Is this the right order? Consider the distance from the sun and explain your reasoning."
If the narrator actually read the _real_ prompts they fed Gemini in these videos, this would not be as impressive at all!
M4v3R|2 years ago
magicalist|2 years ago
> In this post, we’ll explore some of the prompting approaches we used in our Hands on with Gemini demo video.
Not "here are the full prompts used in the video" or something like that.
None of the entries match up 1:1. And the response to the car example in the video doesn't even make sense in response to the prompt in the post (no mention of speed), and certainly isn't a trimmed portion of the response in the post.
The video has the disclaimer "For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity". It would be weird to write that but not mention that neither the prompts nor responses shared even the same set of words in the same order with the "Real" prompts and responses.
I think your assumption is wrong on this one.
atonse|2 years ago
Now that I learned how fake it is, that is more evidence that Google is in really bad shape with this.
pell|2 years ago
It's crazy that this is where we are now. This is obviously still crazy impressive even if hadn't done those edits.
hot_gril|2 years ago
zarzavat|2 years ago
lll-o-lll|2 years ago
crdrost|2 years ago
It's that, you know some of this happened and you don't know how much. So when it says "what the quack!" presumably the model was prompted "give me answers in a more fun conversational style" (since that's not the style in any of the other clips) and, like, was it able to do that with just a little hint or did it take a large amount of wrangling "hey can you say that again in a more conversational way, what if you said something funny at the beginning like 'what the quack'" and then it's totally unimpressive. I'm not saying that's what happened, I'm saying "because we know we're only seeing a very fragmentary transcript I have no way to distinguish between the really impressive version and the really unimpressive one."
It'll be interesting to use it more as it gets more generally available though.
andrewprock|2 years ago
"What do you think I'm doing? Hint: it's a game."
Anyone with as much "knowledge" as Gemini aught to know it's roshambo.
"Is this the right order? Consider the distance from the sun and explain your reasoning."
Full prompt elided from the video.
calvinv|2 years ago
huytersd|2 years ago