top | item 47006776

(no title)

outlace | 16 days ago

The headline may make it seem like AI just discovered some new result in physics all on its own, but reading the post, humans started off trying to solve some problem, it got complex, GPT simplified it and found a solution with the simpler representation. It took 12 hours for GPT pro to do this. In my experience LLM’s can make new things when they are some linear combination of existing things but I haven’t been to get them to do something totally out of distribution yet from first principles.

discuss

order

CGMthrowaway|16 days ago

This is the critical bit (paraphrasing):

Humans have worked out the amplitudes for integer n up to n = 6 by hand, obtaining very complicated expressions, which correspond to a “Feynman diagram expansion” whose complexity grows superexponentially in n. But no one has been able to greatly reduce the complexity of these expressions, providing much simpler forms. And from these base cases, no one was then able to spot a pattern and posit a formula valid for all n. GPT did that.

Basically, they used GPT to refactor a formula and then generalize it for all n. Then verified it themselves.

I think this was all already figured out in 1986 though: https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.56... see also https://en.wikipedia.org/wiki/MHV_amplitudes

godelski|16 days ago

  > I think this was all already figured out in 1986 though
They cite that paper in the third paragraph...

  Naively, the n-gluon scattering amplitude involves order n! terms. Famously, for the special case of MHV (maximally helicity violating) tree amplitudes, Parke and Taylor [11] gave a simple and beautiful, closed-form, single-term expression for all n.
It also seems to be a main talking point.

I think this is a prime example of where it is easy to think something is solved when looking at things from a high level but making an erroneous conclusion due to lack of domain expertise. Classic "Reviewer 2" move. Though I'm not a domain expert and so if there was no novelty over Parke and Taylor I'm pretty sure this will get thrashed in review.

btown|16 days ago

It bears repeating that modern LLMs are incredibly capable, and relentless, at solving problems that have a verification test suite. It seems like this problem did (at least for some finite subset of n)!

This result, by itself, does not generalize to open-ended problems, though, whether in business or in research in general. Discovering the specification to build is often the majority of the battle. LLMs aren't bad at this, per se, but they're nowhere near as reliably groundbreaking as they are on verifiable problems.

lupsasca|16 days ago

That paper from the 80s (which is cited in the new one) is about "MHV amplitudes" with two negative-helicity gluons, so "double-minus amplitudes". The main significance of this new paper is to point out that "single-minus amplitudes" which had previously been thought to vanish are actually nontrivial. Moreover, GPT-5.2 Pro computed a simple formula for the single-minus amplitudes that is the analogue of the Parke-Taylor formula for the double-minus "MHV" amplitudes.

woeirua|16 days ago

You should probably email the authors if you think that's true. I highly doubt they didn't do a literature search first though...

helterskelter|16 days ago

> But no one has been able to greatly reduce the complexity of these expressions, providing much simpler forms.

Slightly OT, but wasn't this supposed to be largely solved with amplituhedrons?

ericmay|16 days ago

Still pretty awesome though, if you ask me.

nine_k|16 days ago

Sounds somehow similar to the groundbreaking application of a computer to prove the 4 color theorem. Then the researchers wrote a program to find and formally prove the numerous particular cases. Here the computer finds a simplifying pattern.

torginus|16 days ago

I'm not sure if GPTs ability goes beyond a formal math package's in this regard or its just its just way more convienient to ask ChatGPT rather than using these software.

randomtoast|16 days ago

> but I haven’t been to get them to do something totally out of distribution yet from first principles

Can humans actually do that? Sometimes it appears as if we have made a completely new discovery. However, if you look more closely, you will find that many events and developments led up to this breakthrough, and that it is actually an improvement on something that already existed. We are always building on the shoulders of giants.

davorak|16 days ago

> Can humans actually do that?

From my reading yes, but I think I am likely reading the statement differently than you are.

> from first principles

Doing things from first principles is a known strategy, so is guess and check, brute force search, and so on.

For an llm to follow a first principles strategy I would expect it to take in a body of research, come up with some first principles or guess at them, then iteratively construct and tower of reasonings/findings/experiments.

Constructing a solid tower is where things are currently improving for existing models in my mind, but when I try openai or anthropic chat interface neither do a good job for long, not independently at least.

Humans also often have a hard time with this in general it is not a skill that everyone has and I think you can be a successful scientist without ever heavily developing first principles problem solving.

samrus|16 days ago

Yes. Thats how all advancement in human knowledge happened. Small and incremental forays out of our training distribution.

These have been identified as various things. Eureka moments, strokes of genius, out of the box thinking, lateral thinking.

LLMs have not shown to be capable of this. They might be in the future, but they havent yet

dotancohen|16 days ago

Relativity comes to mind.

You could nitpick a rebuttal, but no matter how many people you give credit, general relativity was a completely novel idea when it was proposed. I'd argue for special relatively as well.

CooCooCaCha|16 days ago

Depends on what you think is valid.

The process you’re describing is humans extending our collective distribution through a series of smaller steps. That’s what the “shoulders of giants” means. The result is we are able to do things further and further outside the initial distribution.

So it depends on if you’re comparing individual steps or just the starting/ending distributions.

tjr|16 days ago

Go enough shoulders down, and someone had to have been the first giant.

utopiah|16 days ago

Arguably it's precisely a paradigm shift. Continuing whatever worked until now is within the paradigm, our current theories and tools works, we find few problems that don't fit but that's fine the rest is still progress, we keep on hitting more problems or those few pesky unsolved problems actually appear to be important. We then go back to the theory and its foundations and finally challenge them. We break from the old paradigm and come up with new theories and tools because the first principles are now better understood and we iterate.

So that's actually 2 different regimes on how to proceed. Both are useful but arguably breaking off of the current paradigm is much harder and thus rare.

D-Machine|16 days ago

The tricky part is that LLMs aren't just spewing outputs from the distribution (or "near" learned manifolds), but also extrapolating / interpolating (depending on how much you care about the semantics of these terms https://arxiv.org/abs/2110.09485).

There are genuine creative insights that come from connecting two known semantic spaces in a way that wasn't obvious before (e.g, novel isomorphism). It is very conceivable that LLMs could make this kind of connection, but we haven't really seen a dramatic form of this yet. This kind of connection can lead to deep, non-trivial insights, but whether or not it is "out-of-distribution" is harder to answer in this case.

tshaddox|16 days ago

I mean, there’s just no way you can take the set of publicly known ideas from all human civilizations, say, 5,000 years ago, and say that all the ideas we have now were “in the distribution” then. New ideas actually have to be created.

godelski|16 days ago

  > Can humans actually do that? 
Yes

Seriously, think about it for a second...

If that were true then science should have accelerated a lot faster. Science would have happened differently and researchers would have optimized to trying to ingest as many papers as they can.

Dig deep into things and you'll find that there are often leaps of faith that need to be made. Guesses, hunches, and outright conjectures. Remember, there are paradigm shifts that happen. There are plenty of things in physics (including classical) that cannot be determined from observation alone. Or more accurately, cannot be differentiated from alternative hypotheses through observation alone.

I think the problem is when teaching science we generally teach it very linearly. As if things easily follow. But in reality there is generally constant iterative improvements but they more look like a plateau, then there are these leaps. They happen for a variety of reasons but no paradigm shift would be contentious if it was obvious and clearly in distribution. It would always be met with the same response that typical iterative improvements are met with "well that's obvious, is this even novel enough to be published? Everybody already knew this" (hell, look at the response to the top comment and my reply... that's classic "Reviewer #2" behavior). If it was always in distribution progress would be nearly frictionless. Again, with history in how we teach science we make an error in teaching things like Galileo, as if The Church was the only opposition. There were many scientists that objected, and on reasonable grounds. It is also a problem we continually make in how we view the world. If you're sticking with "it works" you'll end up with a geocentric model rather than a heliocentric model. It is true that the geocentric model had limits but so did the original heliocentric model and that's the reason it took time to be adopted.

By viewing things at too high of a level we often fool ourselves. While I'm criticizing how we teach I'll also admit it is a tough thing to balance. It is difficult to get nuanced and in teaching we must be time effective and cover a lot of material. But I think it is important to teach the history of science so that people better understand how it actually evolves and how discoveries were actually made. Without that it is hard to learn how to actually do those things yourself, and this is a frequent problem faced by many who enter PhD programs (and beyond).

  > We are always building on the shoulders of giants.
And it still is. You can still lean on others while presenting things that are highly novel. These are not in disagreement.

It's probably worth reading The Unreasonable Effectiveness of Mathematics in the Natural Sciences. It might seem obvious now but read carefully. If you truly think it is obvious that you can sit in a room armed with only pen and paper and make accurate predictions about the world, you have fooled yourself. You have not questioned why this is true. You have not questioned when this actually became true. You have not questioned how this could be true.

https://www.hep.upenn.edu/~johnda/Papers/wignerUnreasonableE...

  You are greater than the sum of your parts

stouset|16 days ago

When chess engines were first developed, they were strictly worse than the best humans. After many years of development, they became helpful to even the best humans even though they were still beatable (1985–1997). Eventually they caught up and surpassed humans but the combination of human and computer was better than either alone (~1997–2007). Since then, humans have been more or less obsoleted in the game of chess.

Five years ago we were at Stage 1 with LLMs with regard to knowledge work. A few years later we hit Stage 2. We are currently somewhere between Stage 2 and Stage 3 for an extremely high percentage of knowledge work. Stage 4 will come, and I would wager it's sooner rather than later.

MITSardine|15 days ago

There's a major difference between chess and scientific research: setting the objectives is itself part of the work.

In chess, there's a clear goal: beat the game according to this set of unambiguous rules.

In science, the goals are much more diffuse, and setting those in the first place is what makes a scientist more or less successful, not so much technical ability. It's a very hierarchical field where permanent researchers direct staff (postdocs, research scientists/engineers), direct grad students. And it's at the bottom of the pyramid where the technical ability is the most relevant/rewarded.

Research is very much a social game, and I think replacing it with something run by LLMs (or other automatic process) is much more than a technical challenge.

bluecalm|16 days ago

The evolution was also interesting: first the engines were amazing tactically but pretty bad strategically so humans could guide them. With new NN based engines they were amazing strategically but they sucked tactically (first versions of Leela Chess Zero). Today they closed the gap and are amazing at both strategy and tactics and there is nothing humans can contribute anymore - all that is left is to just watch and learn.

TGower|16 days ago

With a chess engine, you could ask any practitioner in the 90's what it would take to achieve "Stage 4" and they could estimate it quite accurately as a function of FLOPs and memory bandwidth. It's worth keeping in mind just how little we understand about LLM capability scaling. Ask 10 different AI researchers when we will get to Stage 4 for something like programming and you'll get wild guesses or an honest "we don't know".

guluarte|16 days ago

so we are going back to physical labor then

empath75|16 days ago

We are already at stage 3 for software development and arguably step 4

bpodgursky|16 days ago

I don't want to be rude but like, maybe you should pre-register some statement like "LLMs will not be able to do X" in some concrete domain, because I suspect your goalposts are shifting without you noticing.

We're talking about significant contributions to theoretical physics. You can nitpick but honestly go back to your expectations 4 years ago and think — would I be pretty surprised and impressed if an AI could do this? The answer is obviously yes, I don't really care whether you have a selective memory of that time.

RandomLensman|16 days ago

I don't know enought about theoretical physics: what makes it a significant contribution there?

outlace|16 days ago

I never said LLMs will not be able to do X. I gave my summary of the article and my anecdotal experiences with LLMs. I have no LLM ideology. We will see what tomorrow brings.

nozzlegear|16 days ago

> We're talking about significant contributions to theoretical physics.

Whoever wrote the prompts and guided ChatGPT made significant contributions to theoretical physics. ChatGPT is just a tool they used to get there. I'm sure AI-bloviators and pelican bike-enjoyers are all quite impressed, but the humans should be getting the research credit for using their tools correctly. Let's not pretend the calculator doing its job as a calculator at the behest of the researcher is actually a researcher as well.

emil-lp|16 days ago

"GPT did this". Authored by Guevara (Institute for Advanced Study), Lupsasca (Vanderbilt University), Skinner (University of Cambridge), and Strominger (Harvard University).

Probably not something that the average GI Joe would be able to prompt their way to...

I am skeptical until they show the chat log leading up to the conjecture and proof.

Sharlin|16 days ago

I'm a big LLM sceptic but that's… moving the goalposts a little too far. How could an average Joe even understand the conjecture enough to write the initial prompt? Or do you mean that experts would give him the prompt to copy-paste, and hope that the proverbial monkey can come up with a Henry V? At the very least posit someone like a grad student in particle physics as the human user.

jmalicki|16 days ago

"Grad Student did this". Co-authored by <Famous advisor 1>, <Famous advisor 2>, <Famous advisor 3>.

Is this so different?

sejje|15 days ago

The Average Joe reads at an 8th grade level. 21% are illiterate in the US.

LLMs surpassed the average human a long time ago IMO. When LLMs fail to measure up to humans, it's that they fail to measure up against human experts in a given field, not the Average Joe.

We are surrounded by NPCs.

hgfda|16 days ago

[deleted]

famouswaffles|16 days ago

The paper has all those prominent institutions who acknowledge the contribution so realistically, why would you be skeptical ?

slibhb|16 days ago

> In my experience LLM’s can make new things when they are some linear combination of existing things but I haven’t been to get them to do something totally out of distribution yet from first principles.

What's the distinction between "first principles" and "existing things"?

I'm sympathetic to the idea that LLMs can't produce path-breaking results, but I think that's true only for a strict definition of path-breaking (that is quite rare for humnans too).

hellisad|16 days ago

Hmm feels a bit trivializing, we don't know exactly how difficult it was to come up with the generic set of equations mentioned from the human starting point.

I can claim some knowledge of physics from my degree, typically the easy part is coming up with complex dirty equations that work under special conditions, the hard part is the simplification into something elegant, 'natural' and general.

Also "LLM’s can make new things when they are some linear combination of existing things"

Doesn't really mean much, what is a linear combination of things you first have to define precisely what a thing is?

epolanski|16 days ago

Serious questions, I often hear about this "let the LLM cook for hours" but how do you do that in practice and how does it manages its own context? How doesn't it get lost at all after so many tokens?

lovecg|16 days ago

I’m guessing, would love someone who has first hand knowledge to comment. But my guess is it’s some combination of trying many different approaches in parallel (each in a fresh context), then picking the one that works, and splitting up the task into sequential steps, where the output of one step is condensed and is used as an input to the next step (with possibly human steering between steps)

javier123454321|16 days ago

From what I've seen is a process of compacting the session once it reaches some limit, which basically means summarizing all the previous work and feeding it as the initial prompt for the next session.

8note|16 days ago

the annoying part is that with tool calls, a lot of those hours is time spent on netowrk round trips.

over long periods of time, checklists are the biggest thing, so the LLM can track whats already done and whats left. after a compact, it can pull the relevant stuff back up and make progress.

having some level or hierarchy is also useful - requirements, high level designs, low level designs, etc

anon291|16 days ago

Very very few human individuals are capable of making new things that are not a linear combination of existing things. Even such things as special relativity were an application of two previous ideas. All of special relativity is deriveable from the principles of relative motion (known into antiquity) and the constant speed of light (which was known to Einstein). From there it is a straightforwards application of the Pythagorean theorem to realize there is a contradiction and the lorentz factor falls out naturally via basic algebra.

tedd4u|16 days ago

What does a 12-hour solution cost an OpenAI customer?

int_19h|16 days ago

$200/month would cover many such sessions every month.

The real question is, what does it cost OpenAI? I'm pretty sure both their plans are well below cost, at least for users who max them out (and if you pay $200 for something then you'll probably do that!). How long before the money runs out? Can they get it cheap enough to be profitable at this price level, or is this going to be "get them addicted then jack it up" kind of strategy?

sathish316|16 days ago

> I haven’t been to get them to do something totally out of distribution yet from first principles.

Agree with this. I’ve been trying to make LLMs come up with creative and unique word games like Wordle and Uncrossy (uncrossy.com), but so far GPT-5.2 has been disappointing. Comparatively, Opus 4.5 has been doing better on this.

But it’s good to know that it’s breaking new ground in Theoretical Physics!

FranklinJabar|16 days ago

Surely higher level math is just linear combinations of the syntax and implications of lower level math. LLMs are taught syntax of basically all existing math notation, I assume. Much of math is, after all, just linguistic manipulation and detection of contradiction in said language with a more formal, a priori language.

MITSardine|15 days ago

LLMs can write theorems, but can they come up with meaningful definitions?

acchow|16 days ago

> In my experience LLM’s can make new things when they are some linear combination of existing things

It seems to me that all “new ideas” are basically linear combinations of existing things with exceeding rare exceptions…

Maybe Godel’s Incompleteness?

Darwinian evolution?

General Relativity?

Buddhist non-duality?

malshe|16 days ago

My physics professor once claimed that imagination is just mental manipulation of past experiences. I never thought it was true for human beings but for LLMs it makes perfect sense.

zaphirplane|16 days ago

I must be a Luddite, how do you have a model working for 12 hours on a problem. Mine is ready with an answer and always interrupts to ask confirmation or show answer

arjie|15 days ago

That's on the harness - the device actually sending the prompt to the model. You can write a different harness that feeds the problem back in for however long you want. Ask Claude Code or Codex to build it for you in as minimal a fashion as possible and you'll see that a naïve version is not particularly more complex than `while true; do prompt $file >> file; done` (though it's not that precisely, obviously).

DeathArrow|16 days ago

>LLM’s can make new things when they are some linear combination of existing things

Aren't most new things linear combinations of existing things (up to a point)?

waynesonfire|16 days ago

> It took 12 hours for GPT pro to do this

Thanks for the summary; but this is a huge hand-wave. was GPT Pro just spinning for 12 hours and returend 42?!

Sparkyte|15 days ago

AI cough LLMs don't discover things they simply surface information that already existed.

slibhb|15 days ago

You're assuming there aren't "new things" latent inside currently existing information. That's definitely false, particulary for math/physics.

But it's worth thinking more about this. What gives humans the ability to discover "new things"? I would say it's due to our interaction with the universe via our senses, and not due to some special powers intrinsic to our brains that LLMs lack. And the thing is, we can feed novel measurements to LLMs (or, eventually, hook them up to camera feeds to "give them senses")

bottlepalm|16 days ago

Is every new thing not just combinations of existing things? What does out of distribution even mean? What advancement has ever made that there wasn’t a lead up of prior work to it? Is there some fundamental thing that prevents AI from recombining ideas and testing theories?

outlace|16 days ago

For example, ever since the first GPT 4 I’ve tried to get LLM’s to build me a specific type of heart simulation that to my knowledge does not exist anywhere on the public internet (otherwise I wouldn’t try to build it myself) and even up to GPT 5.3 it still cannot do it.

But I’ve successfully made it build me a great Poker training app, a specific form that also didn’t exist, but the ingredients are well represented on the internet.

And I’m not trying to imply AI is inherently incapable, it’s just an empirical (and anecdotal) observation for me. Maybe tomorrow it’ll figure it out. I have no dogmatic ideology on the matter.

fpgaminer|16 days ago

> Is every new thing not just combinations of existing things?

If all ideas are recombinations of old ideas, where did the first ideas come from? And wouldn't the complexity of ideas be thus limited to the combined complexity of the "seed" ideas?

I think it's more fair to say that recombining ideas is an efficient way to quickly explore a very complex, hyperdimensional space. In some cases that's enough to land on new, useful ideas, but not always. A) the new, useful idea might be _near_ the area you land on, but not exactly at. B) there are whole classes of new, useful ideas that cannot be reached by any combination of existing "idea vectors".

Therefore there is still the necessity to explore the space manually, even if you're using these idea vectors to give you starting points to explore from.

All this to say: Every new thing is a combination of existing things + sweat and tears.

The question everyone has is, are current LLMs capable of the latter component. Historically the answer is _no_, because they had no real capacity to iterate. Without iteration you cannot explore. But now that they can reliably iterate, and to some extent plan their iterations, we are starting to see their first meaningful, fledgling attempts at the "sweat and tears" part of building new ideas.

D-Machine|16 days ago

> What does out of distribution even mean?

There are in fact ways to directly quantify this, if you are training e.g. a self-supervised anomaly-detection model.

Even with modern models not trained in that manner, looking at e.g. cosine distances of embeddings of "novel" outputs could conceivably provide objective evidence for "out-of-distribution" results. Generally, the embeddings of out-of-distribution outputs will have a large cosine (or even Euclidean) distance from the typical embedding(s). Just, most "out-of-distribution" outputs will be nonsense / junk, so, searching for weird outputs isn't really helpful, in general, if your goal is useful creativity.

amelius|16 days ago

Just wait until LLMs are fast and cheap enough to be run in a breadth first search kind of way, with "fuzzy" pruning.

bamboozled|15 days ago

All you have to do is see "openai.com" in the submission URL to know it's bullshit.

mirsadm|15 days ago

My issue with any of these claims is the lack of proof. Just share the chat and now it got to the discovery. I'll believe it when I can see it for myself at this point. It's too easy to make all sorts of claims without proof these days. Elon Musk makes them all the time.

verdverm|16 days ago

[deleted]

buttered_toast|16 days ago

Absolutely no way this is true right? Ilya left around the time 4o was released. I can't imagine they haven't had a single successful run since then.