I used this just-released API (of Gemini Pro) with multimodal input to test some of the things from the infamous Gemini Demo. You can see here [ https://www.youtube.com/watch?v=__nL7Vc0OCg ] my GPT-4 recreation of that ad which went viral.
Gemini Pro is... not great. In one test, I asked what gesture I was making (while showing a thumbs up) -- it said thumbs down and "The image is a commentary on the changing nature of truth".
I think the fair comparison would be GPT3.5 (if image inputs were supported) vs Gemini Pro. It would be great to compare this with Gemini Ultra next year.
It’s far more than reasonable, it might be Google’s saving grace.
I wasn’t going to bother even testing Google’s AI products unless everyone started gushing about how much better they are than GPT4 but with 60 free queries per minute? That’s worth exploring even if only to find out shortly that it’s not worth paying for.
This must be a significant investment to pick up the hype; 1 qps cannot be sustainable with a free pricing tier unless their resource efficiency is 10x better than competitors.
I know it's just an anecdote, but my biggest problem with Google's Bard/Gemini is that the moment I tried to ask a question about something - I started getting ads all over the internet and social media related to that.
Doing this with ChatGPT 4.0 for months and months did not cause this type of behavior.
Does that happen even with Bard Activity turned off? It's kind of silly of Google because the types of queries I would send to Bard are the type that I wouldn't care to see as adverts anyway!
I like that they have a "blog post creator"[1] in their examples. There's no hope for the future of the web when the self-proclaimed stewards of its quality encourage AI spam.
Developers can start building with our first version of Gemini Pro through Google AI Studio at ai.google.dev
Developers have a free quota and access to a full range of features including function calling, embeddings, semantic retrieval, custom knowledge grounding, chat functionality and more. It supports 38 languages across 180+ countries.
It looks like a fairly easy swap-in for GPT. "messages" becomes "content". Some of the configuration parameters are slightly different (topP/etc), but I have never put in the effort to understand the practical effect of those so I never tweak their values.
The messages themselves are a list of "parts", which allows mixed media messages. This feels a little cleaner than how GPT has handled messages being extended.
Instead of role: "assistant" they use role: "model". There's no role: "system" – presumably you just shove everything into user messages. You can also leave off the role... and I assume that means default to "user" but it's not clear if it's 100% equivalent...?
There's a bunch of moderation parameters, which seems like a good idea. OpenAI has a moderation endpoint you can use to preflight check your input, but doing it all at once makes more sense. There's four categories and you can adjust your sensitivity to each (and turn off blocking at entirely). The sensitivity is not about how extreme the violation is, but how likely it is a violation. So it's not like a G/PG/PG-13/etc rating. Just a question of how many false positives/negatives you want.
There's functions, though they are in beta (whatever that means): https://ai.google.dev/docs/function_calling – they look very very similar to GPT functions. They don't have the "JSON response" that GPT has, but that seems mostly redundant with functions anyway.
I have no idea how well prompts translate, but it feels like the API is an easy translation. And importantly everything is semantically equivalent, you don't have to make one pretend it is the other, like turning a completion API into a chat API.
Given the generous free tier I feel fairly motivated to swap in Gemini and try to ship experiments that I've sat on until now.
why on earth did they design the Node.js and Web APIs to be slightly different and incompatible? (edit: this might just be a bug/oversight on the landing page?)
Node.js:
const model = genAI.getGenerativeModel({ model: "gemini-pro-vision"});
const result = model.generateContent({ contents: [{parts: [
{text: "What’s in this photo?"},
{inlineData: {data: imgBase64, mimeType: 'image/png'}}
] }] })
Web:
const model = genAI.getGenerativeModel({ model: "gemini-pro-vision"});
const result = await model.generateContent([
"What’s in this photo?",
{inlineData: {data: imgDataInBase64, mimeType: 'image/png'}}
]);
Hey! This is Miguel from Google working of these SDKS. I'm confused about this comment, Both Web and Node.js are the same. Can you clarify where you see the difference?
Wow. what a crap site.
I clicked on the option for prompt thinking I could go back and request an API key. Boy was I wrong. No matter what I do it takes me to the prompt console where I get Access Denied and it hijacked my back button.
When I try to create an API key it says that "We are sorry, but you do not have access to Early Access Apps" yet my domain does allow access to early access apps....
"Gemini" is a very common name (being the name of a constellation) which has been used by countless products, companies and endeavors over the years. Almost no one outside of Hacker News and a small core of misanthropic anarchists knows about, much less cares about, the Gemini protocol.
In the case of this specific Gemini, it's apparently the result of there being two teams involved, and it's a reference to the Gemini space program[0].
Since the Maker Suite doesn't have the same prompting as something like Bard or ChatGPT, I gave Gemini Pro a bit more context:
> Consider the following user chat. Reason out loud as needed to provide a reasonable response. User message: "Julia has three brothers, each of them has two sisters. How many sisters does Julia have?"
Gemini Pro:
> Julia's brothers have nothing to do with how many sisters she has. Julia's sisters are also her brothers' sisters, so the number of sisters Julia has is the same as the number of sisters her brothers have. Julia's brothers have two sisters each, so if we add up the number of sisters for each brother, we get: 2 + 2 + 2 = 6. Therefore, Julia has 6 sisters. The answer is 6.
Asking these to GPT3.5 has been an utterly frustrating experience, lol. I guess gemini is at this level of intelligence right now, not GPT4... rigged demos notwithstanding;)
To the second question chatGPT answered : The orange is in the kitchen, as that is where the plate, along with the orange, was moved.
Now a human can reason because a human can visualize in their mind that a plate is being moved without moving the orange. How is that kind of reasoning possible in LLMs which in my very basic understanding is predicting the next token.
GPT4 gets the second question wrong for me, but in a weird way.
> Julia has three brothers, each of them has two sisters. How many sisters does Julia have?
Julia has two sisters. Each brother having two sisters refers to the same two sisters, Julia and another sister.
Then if I ask to verify its answer, it realizes the issue.
> Apologies for any confusion. If Julia has three brothers and each of them has two sisters, then Julia is one of the sisters. The other sister is common among all three brothers. Therefore, Julia has only one sister.
Mistral 7b gets the first two wrong, but gets the third one right: "The orange is still in the living room. When you moved the plate, you didn't move the orange. It remains in its original location until you pick it up and move it to a different place"
Great questions. Gpt4 got the second wrong in my test, but corrected itself when I told it to review for errors. It got the other two right at first try.
Get API key, takes me to makersuite, where I get a create API key button that errors. Then when I reload the page, I get a straight forbidden page.
HP said it best, you have to isolate the team from the bigger company to allow them to work as an effective startup. How can solo-preneurs provide better UX & onboarding while doing 16 other jobs than Google can with multi-billion dollar budgets?
You reminded me of how much hatred I had for Google binding all their products language (including Workspaces) to my account language, with no chance to be changed (even if I updated the account settings).
How can they be so unaware of the fact that people will often prefer english because that's the language with most tutorials/guides/resources and makes interoperability in cross-country remote companies simpler?
Don't they want to sell cloud products to global companies?
How am I supposed to help or receive help from my coworkers?
I have lost days and days trying to set Google Sheets in English and I have been stuck with the Italian version no matter how many changes I did to my Sheets or Google account settings.
There's a 5000+ comments/upvotes discussion on their forums and they simply don't give two damns, I don't think humans even see those threads.
Didn't feel so stressed using a software since programming in Liferay professionally or trying to figure out Autodesk products a decade ago for hobby 3d modelling..
It’s not about budgets, it’s about incentives and the Dunbar limit. A solo-preneur, or a small organisation has all the incentives to have great onboarding experiences. And they’re small enough for everyone to know each other and have a coherent vision.
At a huge organisation like Google there’s a plethora of people all working according to their incentives. These organisations pay lip service to customer experience and excellence, but when push comes to shove it’s the flashy projects that get rewarded.
So nobody has a great overview of everything going on to the necessary level of detail, and nobody actually cares about onboarding UX.
Microsoft has the same problem, I want to shoot myself every time I need to get something done for a kid’s account on Xbox.
> How can solo-preneurs provide better UX & onboarding while doing 16 other jobs than Google can
Solo-preneurs don't have to spend time coordinating with the 39 other people assigned to the project plus the 20 execs hovering over their shoulder, all with different ideas and agendas about what the product should be, while navigating the politics to get appropriate resources assigned without upsetting the hundreds of other folks with competing interests.
This reminds me of the year i had to use GCloud/Ads/Analytics and their respective interfaces which seemingly had no connections and changed randomly hour to hour.
Simple features so bizarrely complex and hidden that turning an instance off, checking visitors for a page or other trivial matters seriously required hiring a full time "google interface understander", i kid you not.
VertexAI is the umbrella for all of the Google models available through their cloud platform.
You want the last link if you are looking for a ChatGPT like experience, with the ability to also adjust the parameters, so more like a UI on top of the API
When I enter into the AI, Firefox blocks an insane amount of popups. The counter for blocked pop ups quickly reaches >100 where it stops counting. What is it trying to do?
60 QPM free is great, but the pay-as-you-go pricing is the same.
Courtesy of GPT4:
"To determine which option is cheaper, Gemini Pro or GPT-3.5 Turbo, we need to consider the average length difference between tokens and characters and the pricing structure for each.
Gemini Pro Pricing:
Input: $0.00025 per 1,000 characters
Output: $0.0005 per 1,000 characters
GPT-3.5 Turbo Pricing:
Input: $0.0010 per 1,000 tokens
Output: $0.0020 per 1,000 tokens
Average Length Difference Between Tokens and Characters: A token in GPT-3.5 can be a single word, part of a word, or a special character. On average, a token in GPT-3 models is roughly equivalent to 4 characters (this is a rough estimate as the length of tokens can vary significantly).
Given this information, let's calculate the effective cost per 1,000 characters for both Gemini Pro and GPT-3.5 Turbo. For GPT-3.5 Turbo, since 1 token ≈ 4 characters, the cost per 1,000 characters would be a quarter of the cost per 1,000 tokens. We'll calculate the cost for both input and output for each and compare.
The total cost per 1,000 characters for both Gemini Pro and GPT-3.5 Turbo, considering both input and output, is the same at $0.00075. Therefore, based on the provided pricing and the average token-to-character ratio, they are equally cost-effective."
gregsadetsky|2 years ago
Gemini Pro is... not great. In one test, I asked what gesture I was making (while showing a thumbs up) -- it said thumbs down and "The image is a commentary on the changing nature of truth".
I just just made a heads-to-heads comparison -- you can watch it here: https://www.youtube.com/watch?v=1RrkRA7wuoE
Code is here: https://github.com/gregsadetsky/sagittarius
dopb|2 years ago
chamoda|2 years ago
https://ai.google.dev/pricing
civilitty|2 years ago
I wasn’t going to bother even testing Google’s AI products unless everyone started gushing about how much better they are than GPT4 but with 60 free queries per minute? That’s worth exploring even if only to find out shortly that it’s not worth paying for.
matsemann|2 years ago
summerlight|2 years ago
isalmon|2 years ago
Doing this with ChatGPT 4.0 for months and months did not cause this type of behavior.
rany_|2 years ago
pesfandiar|2 years ago
[1] https://makersuite.google.com/app/prompts/blog-post-creator
pphysch|2 years ago
dudus|2 years ago
sam1234apter|2 years ago
Developers have a free quota and access to a full range of features including function calling, embeddings, semantic retrieval, custom knowledge grounding, chat functionality and more. It supports 38 languages across 180+ countries.
georgehill|2 years ago
Wait only in the US?
Edit: I can access it through the Google Cloud Console.
https://imgur.com/a/NXAgvFb
ianbicking|2 years ago
It looks like a fairly easy swap-in for GPT. "messages" becomes "content". Some of the configuration parameters are slightly different (topP/etc), but I have never put in the effort to understand the practical effect of those so I never tweak their values.
The messages themselves are a list of "parts", which allows mixed media messages. This feels a little cleaner than how GPT has handled messages being extended.
Instead of role: "assistant" they use role: "model". There's no role: "system" – presumably you just shove everything into user messages. You can also leave off the role... and I assume that means default to "user" but it's not clear if it's 100% equivalent...?
There's a bunch of moderation parameters, which seems like a good idea. OpenAI has a moderation endpoint you can use to preflight check your input, but doing it all at once makes more sense. There's four categories and you can adjust your sensitivity to each (and turn off blocking at entirely). The sensitivity is not about how extreme the violation is, but how likely it is a violation. So it's not like a G/PG/PG-13/etc rating. Just a question of how many false positives/negatives you want.
There's functions, though they are in beta (whatever that means): https://ai.google.dev/docs/function_calling – they look very very similar to GPT functions. They don't have the "JSON response" that GPT has, but that seems mostly redundant with functions anyway.
I have no idea how well prompts translate, but it feels like the API is an easy translation. And importantly everything is semantically equivalent, you don't have to make one pretend it is the other, like turning a completion API into a chat API.
Given the generous free tier I feel fairly motivated to swap in Gemini and try to ship experiments that I've sat on until now.
brrrrrm|2 years ago
Node.js:
Web:miguelramos|2 years ago
magemgem|2 years ago
lovasoa|2 years ago
I will happily let google buy me for that price.
https://ai.google.dev/pricing
thedangler|2 years ago
legendofbrando|2 years ago
AlmostSchurLie|2 years ago
vibhajaiman|2 years ago
vibhajaiman|2 years ago
andre-z|2 years ago
SubiculumCode|2 years ago
roschdal|2 years ago
magemgem|2 years ago
prakhar897|2 years ago
ziga9|2 years ago
zlg_codes|2 years ago
I'm sure it's just an accident.
krapp|2 years ago
In the case of this specific Gemini, it's apparently the result of there being two teams involved, and it's a reference to the Gemini space program[0].
[0]https://twitter.com/JeffDean/status/1733580264859926941
tanyongsheng|2 years ago
replwoacause|2 years ago
fotcorn|2 years ago
Some basic prompts, which are answered correctly most of the time by ChatGPT4:
There are 31 books in my house. I read 2 books over the weekend. How many books are still in my house?
> 29 books
Julia has three brothers, each of them has two sisters. How many sisters does Julia have?
> Three
If you place an orange below a plate in the living room, and then move the plate to the kitchen, where is the orange now?
> Under the plate in the kitchen.
So, not great.
tr3ntg|2 years ago
> Consider the following user chat. Reason out loud as needed to provide a reasonable response. User message: "Julia has three brothers, each of them has two sisters. How many sisters does Julia have?"
Gemini Pro:
> Julia's brothers have nothing to do with how many sisters she has. Julia's sisters are also her brothers' sisters, so the number of sisters Julia has is the same as the number of sisters her brothers have. Julia's brothers have two sisters each, so if we add up the number of sisters for each brother, we get: 2 + 2 + 2 = 6. Therefore, Julia has 6 sisters. The answer is 6.
ajhai|2 years ago
isoprophlex|2 years ago
la64710|2 years ago
Now a human can reason because a human can visualize in their mind that a plate is being moved without moving the orange. How is that kind of reasoning possible in LLMs which in my very basic understanding is predicting the next token.
amf12|2 years ago
EZ-E|2 years ago
> Julia has three brothers, each of them has two sisters. How many sisters does Julia have?
Julia has two sisters. Each brother having two sisters refers to the same two sisters, Julia and another sister.
Then if I ask to verify its answer, it realizes the issue.
> Apologies for any confusion. If Julia has three brothers and each of them has two sisters, then Julia is one of the sisters. The other sister is common among all three brothers. Therefore, Julia has only one sister.
ryoshu|2 years ago
dhoe|2 years ago
kuratkull|2 years ago
bool3max|2 years ago
unknown|2 years ago
[deleted]
imdsm|2 years ago
Get API key, takes me to makersuite, where I get a create API key button that errors. Then when I reload the page, I get a straight forbidden page.
HP said it best, you have to isolate the team from the bigger company to allow them to work as an effective startup. How can solo-preneurs provide better UX & onboarding while doing 16 other jobs than Google can with multi-billion dollar budgets?
epolanski|2 years ago
How can they be so unaware of the fact that people will often prefer english because that's the language with most tutorials/guides/resources and makes interoperability in cross-country remote companies simpler?
Don't they want to sell cloud products to global companies?
How am I supposed to help or receive help from my coworkers?
I have lost days and days trying to set Google Sheets in English and I have been stuck with the Italian version no matter how many changes I did to my Sheets or Google account settings.
There's a 5000+ comments/upvotes discussion on their forums and they simply don't give two damns, I don't think humans even see those threads.
Didn't feel so stressed using a software since programming in Liferay professionally or trying to figure out Autodesk products a decade ago for hobby 3d modelling..
jgilias|2 years ago
At a huge organisation like Google there’s a plethora of people all working according to their incentives. These organisations pay lip service to customer experience and excellence, but when push comes to shove it’s the flashy projects that get rewarded.
So nobody has a great overview of everything going on to the necessary level of detail, and nobody actually cares about onboarding UX.
Microsoft has the same problem, I want to shoot myself every time I need to get something done for a kid’s account on Xbox.
skywhopper|2 years ago
Solo-preneurs don't have to spend time coordinating with the 39 other people assigned to the project plus the 20 execs hovering over their shoulder, all with different ideas and agendas about what the product should be, while navigating the politics to get appropriate resources assigned without upsetting the hundreds of other folks with competing interests.
kossTKR|2 years ago
Simple features so bizarrely complex and hidden that turning an instance off, checking visitors for a page or other trivial matters seriously required hiring a full time "google interface understander", i kid you not.
Kafka on steroids.
cccybernetic|2 years ago
Click on link > Get API Key > "We are sorry, but you do not have access to Early Access Apps"
Since I'm the admin, I checked and confirmed that I do have access. It's enabled for both my account and the entire organization.
Whatever.
code51|2 years ago
But I wanted to note API key generation works when you pick an existing Google Cloud project.
nextworddev|2 years ago
unknown|2 years ago
[deleted]
verdverm|2 years ago
- https://cloud.google.com/vertex-ai (marketing page)
- https://cloud.google.com/vertex-ai/docs (docs entry point)
- https://console.cloud.google.com/vertex-ai (cloud console)
- https://console.cloud.google.com/vertex-ai/model-garden (all the models)
- https://console.cloud.google.com/vertex-ai/generative (studio / playground)
VertexAI is the umbrella for all of the Google models available through their cloud platform.
You want the last link if you are looking for a ChatGPT like experience, with the ability to also adjust the parameters, so more like a UI on top of the API
pvg|2 years ago
https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
unknown|2 years ago
[deleted]
alexb_|2 years ago
werdnapk|2 years ago
skywhopper|2 years ago
unknown|2 years ago
[deleted]
cgannett|2 years ago
[deleted]
bobvanluijt|2 years ago
[deleted]
payzero|2 years ago
behnamoh|2 years ago
georgehill|2 years ago
martythemaniak|2 years ago
- 60 queries per minute free - about 1/5th the price of GPT3.5 Turbo - priced per char, not per token - same image pricing as GPT4 150x150
mil22|2 years ago
Courtesy of GPT4:
"To determine which option is cheaper, Gemini Pro or GPT-3.5 Turbo, we need to consider the average length difference between tokens and characters and the pricing structure for each.
Gemini Pro Pricing: Input: $0.00025 per 1,000 characters Output: $0.0005 per 1,000 characters
GPT-3.5 Turbo Pricing: Input: $0.0010 per 1,000 tokens Output: $0.0020 per 1,000 tokens
Average Length Difference Between Tokens and Characters: A token in GPT-3.5 can be a single word, part of a word, or a special character. On average, a token in GPT-3 models is roughly equivalent to 4 characters (this is a rough estimate as the length of tokens can vary significantly).
Given this information, let's calculate the effective cost per 1,000 characters for both Gemini Pro and GPT-3.5 Turbo. For GPT-3.5 Turbo, since 1 token ≈ 4 characters, the cost per 1,000 characters would be a quarter of the cost per 1,000 tokens. We'll calculate the cost for both input and output for each and compare.
The total cost per 1,000 characters for both Gemini Pro and GPT-3.5 Turbo, considering both input and output, is the same at $0.00075. Therefore, based on the provided pricing and the average token-to-character ratio, they are equally cost-effective."
yeldarb|2 years ago
theusus|2 years ago
_Algernon_|2 years ago