Quite interesting post that asks the right question about "asking the right questions". Yet one aspect I felt missing (which might automatically solve this) is first-principles-based causal reasoning.
A truly intelligent system — one that reasons from first principles by running its own simulations and physical experiments — would notice if something doesn't align with the "textbook version".
It would recognize when reality deviates from expectations and ask follow-up questions, naturally leading to deeper insights and the right questions - and answers.
Fascinating in this space is the new "Reasoning-Prior" approach (MIT Lab & Harvard), which trains reasoning capabilities learned from the physical world as a foundation for new models (before evening learning about text).
Relevant paper: "General Reasoning Requires Learning to Reason from the Get-go."
Interesting, I think the guy who’ll make the GUI for LLMs is the next Jobs/Gates/Musk and Nobel Prize Winner (I think it’ll solve alignment by having millions of eyes on the internals of LLMs), because computers became popular only after the OS with a GUI appeared. I just started ASK HN to let people share their AI safety ideas, both crazy and not: https://news.ycombinator.com/item?id=43332593
I have never heard anyone think this way: “The main mistake people usually make is thinking Newton or Einstein were just scaled-up good students, that a genius comes to life when you linearly extrapolate a top-10% student.
The reason such people are widely lauded as geniuses is precisely because people can’t envision smart students producing paradigm-shifting work as they did.
Yes, people may be talking about AI performance as genius-level but any comparison to these minds is just for marketing purposes.
we kinda think too much of them though. each is also a product of their surroundings, and had contemporaries who could or did come to the same revelations.
A nice post (that should be somewhere smarter than contemporary Twitter/X).
> PS: You might be wondering what such a benchmark could look like. Evaluating it could involve testing a model on some recent discovery it should not know yet (a modern equivalent of special relativity) and explore how the model might start asking the right questions on a topic it has no exposure to the answers or conceptual framework of. This is challenging because most models are trained on virtually all human knowledge available today but it seems essential if we want to benchmark these behaviors. Overall this is really an open question and I’ll be happy to hear your insightful thoughts.
Why benchmarks?
A genius (human or AI) could produce novel insights, some of which could practically be tested in the real world.
"We can gene-edit using such-and-such approach" => Go try it.
No sales brochure claims, research paper comparison charts to show incremental improvement, individual KPIs/OKRs to hit, nor promotion packets required.
The reason you'd have a benchmark is that you want to be able to check in on your model programmatically. DNA wetwork is slow and expensive. While you're absolutely right that benchmarks aren't the best thing ever and that they are used for marketing and sales purposes, they also do seem to generally create capacity momentum in the market. For instance, nobody running local LLMs right now would prefer a 12 month-old model to one of the top models today at the same size - they are significantly more capable, and many researchers believe that training on new and harder benchmarks has been a way to increase that capacity.
The author seems to assume that conjuring up a conjecture is the hard part - yet it will be filled with the same standard mathematics ( granted, sometimes wrapped as new tools, and the proof ends up being as important as the result), often at great cost.
Having powerful assistants that allow people to try out crazy mathematical ideas without fear of risking their careers or just having fun with ideas is likely to have an outsized impact anyway I think.
I think I read somewhere about Erdős having this somewhat brute force approach. Whenever fresh techniques were developed (by himself or others), he would go back to see if they could be used on one of his long-standing open questions.
Even worse, people seem to forget that “science” is not math. You need to test hypotheses with physical (including biological) experiments. The vast majority of the time spent doing “science” is running these experiments.
An LLM like AI won’t help with that. It would still be a huge help in finding and correlating data and information though.
- I started to see LLMs as a kind of search engines. I cannot say they are better than traditional search engines. On one hand, they are better at personalizing the answer, on the other hand, they hallucinate a lot.
- There is a different view on how new scientific knowledge is made. It's all about connecting existing dots. Maybe LLMs can assist with this task by helping scientists discover relevant dots to connect. But as the author suggests, this is only part of the job. To find the correct ways to connect the dots, you need to ask the right questions, examine the space of counterfactuals, etc. LLMs can be useful tool, but they are not autonomous scientists (yet).
- As someone developing software on top of LLMs, I am slowly coming to a conclusion that human-in-the-loop approaches seem to work better than fully autonomous agents.
Instead of connecting language with physical existence, or entities, it's connecting tokens.
An LLM may be able to describe scenes in a video, but a model would tell you that said video is a deep fake because of some principle like conservation of energy and mass informed by experience, assumptions, inference rules, etc.
It doesn't seem correct to dismiss the creativity of Move 37 because real originality is "something more fundamental, like inventing the rules of Go itself"
It would seem more fruitful to simply point out that LLMs aren't all of AI, and that excelling at mimicking human-like text production isn't really doing the work that AlphaGo was attempting.
Just because both things might be given as (different) examples of deep reinforcement learning in an AI survey course doesn't mean that we have much reason to believe that the vast investments in LLMs result in AlphaGo like achievements.
>We're currently building very obedient students, not revolutionaries. This is perfect for today’s main goal in the field of creating great assistants and overly compliant helpers. But until we find a way to incentivize them to question their knowledge and propose ideas that potentially go against past training data, they won't give us scientific revolutions yet.
This would definitely be an interesting future. I wonder what it'd do to all of the work in alignment & safety if we started encouraging AIs to go a bit rogue in some domains.
If this take is correct and we need creative B students, we might still get a compressed 21st century with human creative B students working together with AI A students who support the human with research, validation, workshopping ideas, etc.
"Alpha children wear grey. They work much harder than we do, because they're so frightfully clever. I'm awfully glad I'm a Beta, because I don't work so hard."
I think the author has a point. LLMs struggle with what you might call epistemically constructive novelty. It's the ability not just to synthesize existing knowledge, but to identify what's missing and conjecture something to fill the gap and demonstrate it to satisfaction. Out-of-distribution knowledge gaps are typically where LLMs "hallucinate." Unlike highly skilled human researchers, they don't pause and construct the bridge that will get them from known to unknown, they just immediately rush to fill in the blank with whatever sounds most plausible. They need to ask questions that haven't been asked before, or answer ones that haven't been answered.
Is this just some missing subroutine that we'll eventually figure out? Or is this conjecture-proving process much more elaborate than whatever existing models, no matter how scaled, can manage? I'm not sure. But the answer starts with a question.
I wonder if people could just write their blogs posts in a short form: claim, argument in favor, counter-argument; consequences (this is optional).
Like this whole blog post could be:
Claim: Current AI is unlikely to usher in an era of dramatically accelerated scientific discovery.
Argument in favor: A genius does not come to life when you linearly extrapolate a top-10% student. Newton or Einstein is not just scaled-up good students. To create an Einstein, we need a system can ask questions nobody else has thought of or dared to ask. One that writes 'What if everyone is wrong about this?' when all textbooks, experts, and common knowledge suggest otherwise.
Existing benchmarks don't test such skills. And existing systems are likely hopelessly far from this capability (based on the author's personal feelings).
Could we train an AI model on the corpus of physics knowledge up to the year 1905 and then see if we can adjust the prompt to get it to output the theory of relativity?
This would be an interesting experiment for other historical discoveries too. I'm now curious if anybody has created a model with "old data" like documents and books from hundreds of years ago, and see if comes up with the same conclusions as researchers and scientists of the past.
Would AI have been able to predict the effectiveness of vaccines, insulin, other medical discoveries?
And: There's a similar situation to why double blind studies are necessary - The questions we pose to such a system would be contaminated by our cultural background; We'd might be leading the system.
And if the system is autonomous and we wait for something true to appear how would we know that the final system, trained on current data produced something worthwhile?
Take maths: Producing new proofs and new theorems might not be the issue. Rather: Why should we care about these result? Thousands of PhD students produce new mathematics all the time. And most of it is irrelevant.
That's the ideal, but I think today's models are too crude for that. Relativity is built on differential geometry, which was new at the time. I think inventing or even building that is beyond today's models; there's an infinitely large space of mathematics that can be invented, and barely a gradient to guide the search. Humans don't coin mathematics by gradient descent. The most I've seen is fitting observations using existing mathematics; a technique known as symbolic regression. The E=mc^2 equation could be curve fitted like this, but it would afford no insight.
Had the same thought sometime back about AI discovering theory of relativity with only the data before 1905. It would give a definite answer about whether any reasoning involved in the LLM output.
Wouldn't the ability to "ask the right questions" require that AI could update its own weights, as those weights determine which questions can be asked?
The first thing you need to understand is that no current llm based, transformer architected AI is going to get to agi. The design in essence is not capable of that kind of creativity. In fact no AI that has at its root a statistical analysis or probabilistic correlation will get us past the glorified Google parlor trick that is the modern llm in every form.
A great leap in IP but unfortunately is too important to blab about widely, is the solution to this problem and the architecture that will be contained in the ultimate AGI solution that emerges.
We saw algorithms designing circuits that no human engineer would design, even before the LLM (using genetic algorithms). So out-the-box thinking can be also more reachable than this author thinks.
But there's a reason we don't use those algorithms. We don't need out-of-the-box thinking that's so far outside the box that it's useless.
With these kinds of circuits, they were so sensitive to the specific conditions that the circuit was tested in (temperature, process variation, ..) that the solution couldn't be generalized to be used outside of that specific experiment.
We need the kind of intelligence that can question what assumptions can be challenged, and which we need to keep to have a viable (eventually commercially viable) solution.
including, IIRC, at least one FPGA-based circuit that had a blob of logic not connected to anything else (ie could not possibly be involved in the logical functioning of the circuit), but when removed the implementation stopped working. So the actual circuit wasn't a sensible design option, just a very implementation-specific local minimum.
I think the original design challenge was something like a tone discriminator circuit. I can't recall the details
> If something was not written in a book I could not invent it unless it was a rather useless variation of a known theory. __More annoyingly, I found it very hard to challenge the status-quo__, to question what I had learned.
(__emphasis__ mine)
As if "challenging the status-quo" was the goal in the first place. You ain't gonna get any Einstein by asking people to think inside the "outside the box" box. "Status quo" isn't the enemy, and defying it isn't the path to genius; if you're measuring your own intellectual capacity by proxy of how much you question, you ain't gonna get anywhere useful. After all, questioning everything is easy, and doesn't require any particular skill.
The hard thing is to be right, despite both the status-quo and the "question the status-quo" memes.
(It also helps being in the right time and place, to have access to the results of previous work that is required to make that next increment - that's another, oft forgotten factor.)
>Just consider the crazy paradigm shift of special relativity and the guts it took to formulate a first axiom like “let’s assume the speed of light is constant in all frames of reference” defying the common sense of these days (and even of today…)
I'm not an expert on this. Wasn't this an observed phenomenon before Albert put together his theory?
Weird problems with physics were everywhere before Einstein. Maxwell comes painfully close to discovering GR in some of his musings on black body radiation.
Noticing that there was a problem was not the breakthrough: trying something bizarre and counter-cultural - like assuming light speed is invariant over the observer - just to see if anything interesting drops out was the breakthrough.
We can't distinguish between a truly novel response from an LLM or a hallucination.
We can get some of the way there, such as if we know what the outcome to a problem should look like, and are seeking a better function to achieve that outcome. Certainly at small scales and in environments where there are minimal consequences for failure, this could work.
But this breaks down as things get more complicated. We won't be able to test the effectiveness of 100 million potential solutions to eradicating brain tumors at once. Even if we somehow arrive at guaranteeing that every unforeseen consequence is also accounted for in our exercise in specifying the goals and constraints of the problem. We just simply don't have the logistics to run 100 million clinical trials where we also know how to account for countless confounding effects (let alone consent!)
The reality with people is that most of them don't come close to Einstein level intelligence. A lot of the stuff I ask perplexity or chatgpt is way beyond what I could reasonably ask from the vast majority of people I know. I love my relatives. But they are kind of useless for the vast majority of stuff that bounces around in my head.
AIs are at this point a useful tool for knowledge workers. They don't replace them but enhance their productivity. For scientific work, having an LLM that is trained on essentially all of the scientific work published, ever (until the cutoff date) is probably useful.
You can now have conversations with an AI about cross referencing your ideas with existing work. You might analyze a paper you are writing and ask it to summarize key claims, criticize those, and your methodology, cross reference claims with literature, etc. Find counter points to your claims, etc. And you could probably use it to come up with interesting follow up questions, let it formulate hypotheses and ways to verify those, etc. Most scientific work isn't Archimedes going Eureka while taking a bath but undergraduates, post docs, and other under paid research stuff grinding through piles and piles of existing work and filling their heads with enough information until finally something new and original pops out.
I got my Ph. D. in 2003. I'm part of the first generation of researchers that was able to use Google. At the time that was a huge enabler for tracking down obscure references and authors. Getting a paper published involves an enormous amount of what I just outlined. And LLMs can assist you with that. Will it hallucinate. Absolutely. But it will also dig out valid points, references, etc. Sorting that out is still work that you need to do. But it probably saves a lot of time. Will it propose original new theories. Maybe, maybe not. But it will speed up the process of zooming in on unanswered ones.
Science isn't necessarily about coming up with answers but coming up with interesting questions. That's what Einstein did: ask interesting questions. Researchers are still trying to answer some of them and verifying some of the answers he predicted.
> At the time that was a huge enabler for tracking down obscure references and authors.
Would that still work today, in the highly commercialized and highly sanitized/censored internet? Where Google wouldn't show you those search results because they aren't profitable enough?
And how do you even train an LLM on a fair representation of human knowledge when you only find stuff that is mainstream and commercially viable?
[+] [-] mentalgear|1 year ago|reply
---
Quite interesting post that asks the right question about "asking the right questions". Yet one aspect I felt missing (which might automatically solve this) is first-principles-based causal reasoning.
A truly intelligent system — one that reasons from first principles by running its own simulations and physical experiments — would notice if something doesn't align with the "textbook version".
It would recognize when reality deviates from expectations and ask follow-up questions, naturally leading to deeper insights and the right questions - and answers.
Fascinating in this space is the new "Reasoning-Prior" approach (MIT Lab & Harvard), which trains reasoning capabilities learned from the physical world as a foundation for new models (before evening learning about text).
Relevant paper: "General Reasoning Requires Learning to Reason from the Get-go."
[+] [-] mentalgear|1 year ago|reply
[+] [-] antonkar|1 year ago|reply
[+] [-] zombot|1 year ago|reply
[+] [-] engfan|1 year ago|reply
The reason such people are widely lauded as geniuses is precisely because people can’t envision smart students producing paradigm-shifting work as they did.
Yes, people may be talking about AI performance as genius-level but any comparison to these minds is just for marketing purposes.
[+] [-] 8note|1 year ago|reply
[+] [-] neilv|1 year ago|reply
> PS: You might be wondering what such a benchmark could look like. Evaluating it could involve testing a model on some recent discovery it should not know yet (a modern equivalent of special relativity) and explore how the model might start asking the right questions on a topic it has no exposure to the answers or conceptual framework of. This is challenging because most models are trained on virtually all human knowledge available today but it seems essential if we want to benchmark these behaviors. Overall this is really an open question and I’ll be happy to hear your insightful thoughts.
Why benchmarks?
A genius (human or AI) could produce novel insights, some of which could practically be tested in the real world.
"We can gene-edit using such-and-such approach" => Go try it.
No sales brochure claims, research paper comparison charts to show incremental improvement, individual KPIs/OKRs to hit, nor promotion packets required.
[+] [-] vessenes|1 year ago|reply
[+] [-] Agingcoder|1 year ago|reply
Having powerful assistants that allow people to try out crazy mathematical ideas without fear of risking their careers or just having fun with ideas is likely to have an outsized impact anyway I think.
[+] [-] aleksiy123|1 year ago|reply
I think I read somewhere about Erdős having this somewhat brute force approach. Whenever fresh techniques were developed (by himself or others), he would go back to see if they could be used on one of his long-standing open questions.
[+] [-] tensor|1 year ago|reply
An LLM like AI won’t help with that. It would still be a huge help in finding and correlating data and information though.
[+] [-] timewizard|1 year ago|reply
[+] [-] kristianc|1 year ago|reply
[+] [-] msvana|1 year ago|reply
- I started to see LLMs as a kind of search engines. I cannot say they are better than traditional search engines. On one hand, they are better at personalizing the answer, on the other hand, they hallucinate a lot.
- There is a different view on how new scientific knowledge is made. It's all about connecting existing dots. Maybe LLMs can assist with this task by helping scientists discover relevant dots to connect. But as the author suggests, this is only part of the job. To find the correct ways to connect the dots, you need to ask the right questions, examine the space of counterfactuals, etc. LLMs can be useful tool, but they are not autonomous scientists (yet).
- As someone developing software on top of LLMs, I am slowly coming to a conclusion that human-in-the-loop approaches seem to work better than fully autonomous agents.
[+] [-] downboots|1 year ago|reply
[+] [-] msabalau|1 year ago|reply
It would seem more fruitful to simply point out that LLMs aren't all of AI, and that excelling at mimicking human-like text production isn't really doing the work that AlphaGo was attempting.
Just because both things might be given as (different) examples of deep reinforcement learning in an AI survey course doesn't mean that we have much reason to believe that the vast investments in LLMs result in AlphaGo like achievements.
[+] [-] Nesco|1 year ago|reply
[+] [-] OtherShrezzing|1 year ago|reply
This would definitely be an interesting future. I wonder what it'd do to all of the work in alignment & safety if we started encouraging AIs to go a bit rogue in some domains.
[+] [-] chr15m|1 year ago|reply
[+] [-] downboots|1 year ago|reply
"The lower the caste, the shorter the oxygen."
[+] [-] EigenLord|1 year ago|reply
[+] [-] _cs2017_|1 year ago|reply
Like this whole blog post could be:
Claim: Current AI is unlikely to usher in an era of dramatically accelerated scientific discovery.
Argument in favor: A genius does not come to life when you linearly extrapolate a top-10% student. Newton or Einstein is not just scaled-up good students. To create an Einstein, we need a system can ask questions nobody else has thought of or dared to ask. One that writes 'What if everyone is wrong about this?' when all textbooks, experts, and common knowledge suggest otherwise.
Existing benchmarks don't test such skills. And existing systems are likely hopelessly far from this capability (based on the author's personal feelings).
Counter-argument: none.
Consequences: obvious.
[+] [-] hackerknew|1 year ago|reply
This would be an interesting experiment for other historical discoveries too. I'm now curious if anybody has created a model with "old data" like documents and books from hundreds of years ago, and see if comes up with the same conclusions as researchers and scientists of the past.
Would AI have been able to predict the effectiveness of vaccines, insulin, other medical discoveries?
[+] [-] Garlef|1 year ago|reply
But there might not be enough text.
And: There's a similar situation to why double blind studies are necessary - The questions we pose to such a system would be contaminated by our cultural background; We'd might be leading the system.
And if the system is autonomous and we wait for something true to appear how would we know that the final system, trained on current data produced something worthwhile?
Take maths: Producing new proofs and new theorems might not be the issue. Rather: Why should we care about these result? Thousands of PhD students produce new mathematics all the time. And most of it is irrelevant.
[+] [-] esafak|1 year ago|reply
https://en.wikipedia.org/wiki/Symbolic_regression
[+] [-] ilamparithi|1 year ago|reply
[+] [-] knowaveragejoe|1 year ago|reply
[+] [-] systemstops|1 year ago|reply
[+] [-] esafak|1 year ago|reply
[+] [-] tyronehed|1 year ago|reply
A great leap in IP but unfortunately is too important to blab about widely, is the solution to this problem and the architecture that will be contained in the ultimate AGI solution that emerges.
[+] [-] nahuel0x|1 year ago|reply
[+] [-] audunw|1 year ago|reply
With these kinds of circuits, they were so sensitive to the specific conditions that the circuit was tested in (temperature, process variation, ..) that the solution couldn't be generalized to be used outside of that specific experiment.
We need the kind of intelligence that can question what assumptions can be challenged, and which we need to keep to have a viable (eventually commercially viable) solution.
[+] [-] niccl|1 year ago|reply
I think the original design challenge was something like a tone discriminator circuit. I can't recall the details
[+] [-] randomNumber7|1 year ago|reply
The question he asked was just that this fact was not compatible with the Maxwell equations.
[+] [-] TeMPOraL|1 year ago|reply
(__emphasis__ mine)
As if "challenging the status-quo" was the goal in the first place. You ain't gonna get any Einstein by asking people to think inside the "outside the box" box. "Status quo" isn't the enemy, and defying it isn't the path to genius; if you're measuring your own intellectual capacity by proxy of how much you question, you ain't gonna get anywhere useful. After all, questioning everything is easy, and doesn't require any particular skill.
The hard thing is to be right, despite both the status-quo and the "question the status-quo" memes.
(It also helps being in the right time and place, to have access to the results of previous work that is required to make that next increment - that's another, oft forgotten factor.)
[+] [-] moralestapia|1 year ago|reply
I'm not an expert on this. Wasn't this an observed phenomenon before Albert put together his theory?
[+] [-] zesterer|1 year ago|reply
Noticing that there was a problem was not the breakthrough: trying something bizarre and counter-cultural - like assuming light speed is invariant over the observer - just to see if anything interesting drops out was the breakthrough.
[+] [-] tim333|1 year ago|reply
Einsteins more impressive stuff was explaining that by time passing at different rates for different observers
[+] [-] phillipcarter|1 year ago|reply
We can't distinguish between a truly novel response from an LLM or a hallucination.
We can get some of the way there, such as if we know what the outcome to a problem should look like, and are seeking a better function to achieve that outcome. Certainly at small scales and in environments where there are minimal consequences for failure, this could work.
But this breaks down as things get more complicated. We won't be able to test the effectiveness of 100 million potential solutions to eradicating brain tumors at once. Even if we somehow arrive at guaranteeing that every unforeseen consequence is also accounted for in our exercise in specifying the goals and constraints of the problem. We just simply don't have the logistics to run 100 million clinical trials where we also know how to account for countless confounding effects (let alone consent!)
[+] [-] jillesvangurp|1 year ago|reply
AIs are at this point a useful tool for knowledge workers. They don't replace them but enhance their productivity. For scientific work, having an LLM that is trained on essentially all of the scientific work published, ever (until the cutoff date) is probably useful.
You can now have conversations with an AI about cross referencing your ideas with existing work. You might analyze a paper you are writing and ask it to summarize key claims, criticize those, and your methodology, cross reference claims with literature, etc. Find counter points to your claims, etc. And you could probably use it to come up with interesting follow up questions, let it formulate hypotheses and ways to verify those, etc. Most scientific work isn't Archimedes going Eureka while taking a bath but undergraduates, post docs, and other under paid research stuff grinding through piles and piles of existing work and filling their heads with enough information until finally something new and original pops out.
I got my Ph. D. in 2003. I'm part of the first generation of researchers that was able to use Google. At the time that was a huge enabler for tracking down obscure references and authors. Getting a paper published involves an enormous amount of what I just outlined. And LLMs can assist you with that. Will it hallucinate. Absolutely. But it will also dig out valid points, references, etc. Sorting that out is still work that you need to do. But it probably saves a lot of time. Will it propose original new theories. Maybe, maybe not. But it will speed up the process of zooming in on unanswered ones.
Science isn't necessarily about coming up with answers but coming up with interesting questions. That's what Einstein did: ask interesting questions. Researchers are still trying to answer some of them and verifying some of the answers he predicted.
[+] [-] zombot|1 year ago|reply
Would that still work today, in the highly commercialized and highly sanitized/censored internet? Where Google wouldn't show you those search results because they aren't profitable enough?
And how do you even train an LLM on a fair representation of human knowledge when you only find stuff that is mainstream and commercially viable?
[+] [-] hoseja|1 year ago|reply
That's called hate speech and every AI has been aggressively lobotomized to never do it by an army of RLHFers.