top | item 45376865

(no title)

omarspira | 5 months ago

I would be surprised if this dichotomy you're painting holds up to scrutiny.

My understanding is Gemini is not far behind on "intelligence", certainly not in a way that leaves obvious doubt over where they will be over the next iteration/model cycles, where I would expect them to at least continue closing the gap. I'd be curious if you have some benchmarks to share that suggest otherwise.

Meanwhile, afaik something Google has done, and perhaps relates back to your point re "latency/TPS/cost dimensions" that other providers aren't doing as much is integrating their model into interesting products beyond chat, at a pace that seems surprising given how much criticism they had been taking for being "slow" to react to the LLM trend.

Besides the Google Workspace surface and Google search, which now seem obvious - there are other interesting places where Gemini will surface - https://jules.google/ for one, to say nothing of their experiments/betas in the creative space - https://labs.google/flow/about

Another I noticed today: https://www.google.com/finance/beta

I would have thought putting Gemini on a finance dashboard like this would be inviting all sorts of regulatory (and other) scrutiny... and wouldn't be in keeping with a "slow" incumbent. But given the current climate, it seems Google is plowing ahead just as much as anyone else - with a lot more resources and surface to bring to bear. Imagine Gemini integration on Youtube. At this point it just seems like counting down the days...

discuss

order

CuriouslyC|5 months ago

I do scientific and hard code a lot. Gemini is a good bit below GPT5 in those areas, though still quite good. It's also just a bad agent, it lacks autonomy and isn't RL'd to explore well. Gemini's superpower is being really smart while also having by far the best long context reasoning, use it like an oracle with bundles of your entire codebase (or a subtree if it's too big) to guide agents in implementation.

cerved|5 months ago

Yesterday I asked Gemini to recalculate the timestamps of tasks in a sequence of tasks, given it's duration and the previous timestamp. It proceeded to write code which gave results like this

  2025-09-26T14:32:10Z
  2025-09-26T14:32:10Z200s
  2025-09-26T14:32:10Z200s600s
  2025-09-26T14:32:10Z200s600s300s
It then proceeded to talk about how efficient this approach was for thousands of numbers.

Gemini is by far the dumbest LLM I've used

lelanthran|5 months ago

They're all a little dumb. I asked claude for a python function or functions that will take in markdown in a string and return a string with ansi codes for bold, italics and underline.

It gave me a 160 line parse function.

After gaping for a short while, I implemented it in a 5 line function and a lookup table.

These vibe codes who are proud that they generated thousands of lines of code makes me wonder if they are ever reading what they generate with a critical eye.

ainch|5 months ago

Gemini 2.5-Pro was great when it released, but o3 and GPT-5 both eclipsed it for me—the tool use/search improvements open up so many use cases that Gemini fails at.

perfmode|5 months ago

How’d I never hear of Jules? Cool.

Al-Khwarizmi|5 months ago

And yet my smart speakers with the Google assistant still default to a dumb model from the pre-LLM era (although my phone's version of the assistant does call Gemini). I wonder why that is, as it would be an obvious place to integrate Gemini. The bar is very very low as anything outside the standard setting alarms, checking the weather, etc. it gets wrong most of the time.