top | item 40811067

(no title)

alekandreev | 1 year ago

Hello (again) from the Gemma team! We are quite excited to push this release out and happy to answer any questions!

Opinions are our own and not of Google DeepMind.

discuss

order

luke-stanley|1 year ago

It's fairly easy to pay OpenAI or Mistral money to use their API's. Figuring out how Google Cloud Vertex works and how it's billed is more complicated. Azure and AWS are similar in how complex they are to use for this. Could Google Cloud please provide an OpenAI compatible API and service? I know it's a different department. But it'd make using your models way easier. It often feels like Google Cloud has no UX or end-user testing done on it at all (not true for aistudio.google.com - that is better than before, for sure!).

Deathmax|1 year ago

Gemini models on Vertex AI can be called via a preview OpenAI-compatible endpoint [1], but shoving it into existing tooling where you don't have programmatic control over the API key and is long lived is non-trivial because GCP uses short lived access tokens (and long-lived ones are not great security-wise).

Billing for the Gemini models (on Vertex AI, the Generative Language AI variant still charges by tokens) I would argue is simpler than every other provider, simply because you're charged by characters/image/video-second/audio-second and don't need to run a tokenizer (if it's even available cough Claude 3 and Gemini) and having to figure out what the chat template is to calculate the token cost per message [2] or figure out how to calculate tokens for an image [3] to get cost estimates before actually submitting the request and getting usage info back.

[1]: https://cloud.google.com/vertex-ai/generative-ai/docs/multim...

[2]: https://platform.openai.com/docs/guides/text-generation/mana...

[3]: https://platform.openai.com/docs/guides/vision/calculating-c...

ankeshanand|1 year ago

If you're an individual developer and not an enterprise, just go straight to Google AIStudio or GeminiAPI instead: https://aistudio.google.com/app/apikey. It's dead simple getting an API key and calling with a rest client.

bapcon|1 year ago

I have to agree with all of this. I tried switching to Gemini, but the lack of clear billing/quotas, horrible documentation, and even poor implementation of status codes on failed requests have led me to stick with OpenAI.

I don't know who writes Google's documentation or does the copyediting for their console, but it is hard to adapt. I have spent hours troubleshooting, only to find out it's because the documentation is referring to the same thing by two different names. It's 2024 also, I shouldn't be seeing print statements without parentheses.

hnuser123456|1 year ago

I plan on downloading a Q5 or Q6 version of the 27b for my 3090 once someone puts quants on HF, loading it in LM studio and starting the API server to call it from my scripts based on openai api. Hopefully it's better at code gen than llama 3 8b.

alekandreev|1 year ago

Happy to pass on any feedback to our Google Cloud friends. :)

canyon289|1 year ago

I also work at Google and on Gemma (so same disclaimers)

You can try 27b at www.aistudio,google.com. Send in your favorite prompts, and we hope you like the responses.

dandanua|1 year ago

Why is AIStudio not available in Ukraine? I have no problem with using Gemini web UI or other LLM providers from Ukraine, but this Google API constrain is strange.

moffkalast|1 year ago

The 4k sliding window context seems like a controversial choice after Mistral 7B mostly failed at showing any benefits from it. What was the rationale behind that instead of just going for full 8k or 16k?

alekandreev|1 year ago

This is mostly about inference speed, while maintaining long context performance.

causal|1 year ago

Thanks for your work on this; excited to try it out!

The Google API models support 1M+ tokens, but these are just 8K. Is there a fundamental architecture difference, training set, something else?

coreypreston|1 year ago

No question. Thanks for thinking of 27B.

luke-stanley|1 year ago

Given the goal of mitigating self-proliferation risks, have you observed a decrease in the model's ability to do things like help a user setup a local LLM with local or cloud software?

How much is pre-training dataset changes, how much is tuning?

How do you think about this problem, how do you solve it?

Seems tricky to me.

alekandreev|1 year ago

To quote Ludovic Peran, our amazing safety lead:

Literature has identified self-proliferation as dangerous capability of models, and details about how to define it and example of form it can take have been openly discussed by GDM (https://arxiv.org/pdf/2403.13793).

Current Gemma 2 models' success rate to end-to-end challenges is null (0 out 10), so the capabilities to perform such tasks are currently limited.

WhitneyLand|1 year ago

The paper suggests on one hand Gemma is on the same Pareto curve as Llama3, while on the other hand seems to suggest it’s exceeded its efficiency.

Is this a contradiction or am I misunderstanding something?

Btw overall very impressive work great job.

alekandreev|1 year ago

I think it makes sense to compare models trained with the same recipe on token count - usually more tokens will give you a better model.

However, I wouldn't draw conclusions about different model families, like Llama and Gemma, based on their token count alone. There are many other variables at play - the quality of those tokens, number of epochs, model architecture, hyperparameters, distillation, etc. that will have an influence on training efficiency.

luke-stanley|1 year ago

Any gemma-2-9b or 27b 4 bit GGUF's on HuggingFace yet? Thanks!

chown|1 year ago

If you are still looking for it, I just made it available on an app[1] that I am working on with Gemma2 support.

https://msty.app

kristianpaul|1 year ago

Do run gemma2 on your Google phone?