(no title)
20k | 1 month ago
Without exception, every technical question I've ever asked an LLM that I know the answer to, has been substantially wrong in some fashion. This makes it just.. absolutely useless for research. In some cases I've spotted it straight up plagiarising from the original sources, with random capitalisation giving it away
The issue is that once you get even slightly into a niche, they fall apart because the training data just doesn't exist. But they don't say "sorry there's insufficient training data to give you an answer", they just make shit up and state it as confidently incorrect
simonw|1 month ago
I've been tracking advances in AI assisted search here - https://simonwillison.net/tags/ai-assisted-search/ - in particular:
- https://simonwillison.net/2025/Apr/21/ai-assisted-search/ - April is when they started getting good, with o3 and the various deep research tools
- https://simonwillison.net/2025/Sep/6/research-goblin/ - GPT-5 got excellent. This post includes several detailed examples, including "Starbucks in the UK don’t sell cake pops! Do a deep investigative dive".
- https://simonwillison.net/2025/Sep/7/ai-mode/ - AI mode from Google
locknitpicker|1 month ago
I disagree. You might have seen some improvements in the results, but all LLMs still hallucinate quite hard on simple queries where you prompt them to cite their sources. You'll see ChatGPT insist quite hard that the source of their assertions is the 404 link that it asserts is working.
20k|1 month ago
I asked chatgpt's thinking mode if the adm formalism is strictly equivalent to general relativity, and it made several strongly incorrect statements
This is my favourite:
>3. Boundary terms matter
>To be fully equivalent:
>One must add the correct Gibbons–Hawking–York boundary term
>And handle asymptotic conditions carefully (e.g. ADM energy)
>Otherwise, the variational principle is not well-defined.
Which is borderline gibberish
>The theory still has 2 propagating DOF per spacetime point
This is pretty good too
>(lapse and shift act as Lagrange multipliers, not dynamical fields).
This is also as far as I'm aware just wrong, as the gauge conditions are nonphysical. In general, lapse and shift are generally always treated as dynamical fields
Its full answer reads like someone with minimal understanding of physics trying to bullshit you. Then I asked it if the BSSN formalism is strictly equivalent to the ADM formalism (it isn't, because it isn't covariant)
This answer is actually more wrong, surprisingly
>Yes — classically, the BSSN formalism is equivalent to ADM, but only under specific conditions. In practice, it is a reparameterization plus gauge fixing and constraint handling, not a new theory. The equivalence is more delicate than ADM ↔ GR.
The ONE thing that doesn't change in the BSSN formalism is the gauge conditions
>Rewriting the evolution equations, adding terms proportional to constraints.
This is also pretty inadequate
>Precise equivalence statement
>BSSN is strictly equivalent to ADM at the classical level if:
...
>Gauge choices are compatible >(e.g. lapse and shift not over-constraining the system)
This is complete gibberish
It also states:
>No extra degrees of freedom are introduced
I don't think chatgpt knows what a degree of freedom is
>Why the equivalence is more subtle than ADM ↔ GR >1. BSSN is not a canonical transformation
>Unlike ADM ↔ GR:
>BSSN is not manifestly Hamiltonian
>The Poisson structure is not preserved automatically
>One must reconstruct ADM variables to see equivalence
This is all absolute bollocks. Manifestly hamiltonian is literally gibberish. Neither of these formalisms have a "poisson structure" whatever that means, and sure yes you can construct the adm variables from the bssn variables whoopee
>When equivalence can fail
>Discretized (numerical) system -> Equivalence only approximate
Nobody explain to chatgpt that the ADM formalism is also a discretiseable series of PDEs!
>BSSN and ADM describe the same classical solutions of Einstein’s equations, but BSSN reshapes the phase space and constraint handling to make the evolution well-behaved, sacrificing manifest Hamiltonian structure off-shell.
We're starting to hit timecube levels of nonsense
It also gets the original question completely wrong: The BSSN formalism isn't covariant or coordinate free - there's an alterative bssn-like formalism called cBSSN (covariant bssn), which is similar to ccz4 and z4cc (both covariant). Its an important property that the regular BSSN formalism lacks, which is one of the ways you can identify it as being not a strict equivalence to the ADM formalism on mathematical grounds. So in the ADM formalism you can express your equations in polar coordinates, but if you make that transformation in the BSSN formalism - its no longer the same
This has actually gotten significantly worse than last time I asked chatgpt about this kind of thing, its more confidently incorrect now
pxc|1 month ago
The other problem that I tend to hit is a tradeoff between wrongness and slowness. The fastest variants of the SOTA models are so frequently and so severely wrong that I don't find them useful for search. But the bigger, slower ones that spend more time "thinking" take so long to yield their (admittedly better) results that it's often faster for me to just do some web searching myself.
They tend to be more useful the first time I'm approaching a subject, or before I've familiarized myself with the documentation of some API or language or whatever. After I've taken some time to orient myself (even by just following the links they've given me a few times), it becomes faster for me to just search by myself.
sandworm101|1 month ago
I googled for "helium 3" yesterday. Google's AI answer said that helium 3 is "primarily sourced from the moon", as if we were actively mining it there already.
BYazfVCcq|1 month ago
fishtacos|1 month ago
unknown|1 month ago
[deleted]
HPsquared|1 month ago
elzbardico|1 month ago
Instead of "how cheese X is usually made" "search the web and give me a summary on the ways cheese X is made"
yunohn|1 month ago
The entire situation of web search for LLMs is a mess. None of the existing providers return good or usable results; and Google refuses to provide general access to theirs. As a result, all LLMs (except maybe Gemini) are severely gimped forever until someone solves this.
I seriously believe that the only real new breakthrough for LLM research can be achieved by a clean, trustworthy, comprehensive search index. Maybe someone will build that? Otherwise we’re stuck with subpar results indefinitely.
embedding-shape|1 month ago
josecodea|1 month ago
It's funny for me to read this. They don't exhibit "confidence". You are just getting the most accurate text that it can produce. Of course, the training data doesn't contain "I don't know" for questions, that would be really bad training data! If you are getting "attitudes", it would be because you are triggering some kind of dialogue-esque data with your prompts (or the system prompt might be doing that).
Expecting the LLM to say "sorry I don't know" would be like expecting google search to return "we found some pages but deemed them wrong, so we won't show you any".
samuell|1 month ago
I have been impressed by its results.
I think this fact stems more from its initial search phase than its pure LLM processing power, but to me it seems the approach works really well.