(no title)
karmasimida | 10 days ago
OpenAI has mostly caught up with Claude in agentic stuff, but Google needs to be there and be there quickly
karmasimida | 10 days ago
OpenAI has mostly caught up with Claude in agentic stuff, but Google needs to be there and be there quickly
onlyrealcuzzo|10 days ago
Most of Gemini's users are Search converts doing extended-Search-like behaviors.
Agentic workflows are a VERY small percentage of all LLM usage at the moment. As that market becomes more important, Google will pour more resources into it.
Macha|10 days ago
I do wonder what percentage of revenue they are. I expect it's very outsized relative to usage (e.g. approximately nobody who is receiving them is paying for those summaries at the top of search results)
nimchimpsky|10 days ago
[deleted]
alphabetting|10 days ago
For example the APEX-Agents benchmark for long time horizon investment banking, consulting and legal work:
1. Gemini 3.1 Pro - 33.2% 2. Opus 4.6 - 29.8% 3. GPT 5.2 Codex - 27.6% 4. Gemini Flash 3.0 - 24.0% 5. GPT 5.2 - 23.0% 6. Gemini 3.0 Pro - 18.0%
kakugawa|10 days ago
girvo|10 days ago
I'll withhold judgement until I've tried to use it.
metadat|10 days ago
blueaquilae|10 days ago
HardCodedBias|10 days ago
Let's give it a couple of days since no one believes anything from benchmarks, especially from the Gemini team (or Meta).
If we see on HN that people are willing switching their coding environment, we'll know "hot damn they cooked" otherwise this is another wiff by Google.
swftarrow|10 days ago
hintymad|10 days ago
unknown|10 days ago
[deleted]
gavmor|9 days ago
renegade-otter|10 days ago
miohtama|10 days ago
ionwake|10 days ago
karmasimida|10 days ago
I think this is classic precision/recall issue: the model needs to stay on task, but also infer what user might want but not explicitly stated. Gemini seems particularly bad that recall, where it goes out of bounds