top | item 46925624

(no title)

Syzygies | 22 days ago

I'm a mathematician relying heavily on AI as an association engine of massive scope, to organize and expand my thoughts. One doesn't get best results by "testing" AI.

A surfboard is also an amazing tool, but there's more to operating one than telling it which way to go.

Many people want self-driving cars so they can drink in the back seat watching movies. They'll find their jobs replaced by AI, with a poor quality of life because we're a selfish species. In contrast Niki Lauda trusted fellow Formula 1 race car driver James Hunt to race centimeters apart. Some people want AI to help them drive that well. They'll have great jobs as AI evolves.

Gary Kasparov pioneered "freestyle" chess tournaments after his defeat by Big Blue, where the best human players were paired with computers, coining the "centaur" model of human-machine cooperation. This is frequently cited in the finance literature, where it is recognized that AI-guided human judgement can out-perform either humans or machines.

Any math professor knows how to help graduate students confidently complete a PhD thesis, or how to humiliate students in an oral exam. It’s a choice. To accomplish more work than one can complete alone, choose the former. This is the arc of human evolution: we develop tools to enhance our abilities. We meld with an abacus or a slide rule, and it makes us smarter. We learn to anticipate computations, like we’re playing a musical instrument in our heads. Or we pull out a calculator that makes us dumber. The role we see for our tools matters.

Programmers who actually write better code using AI know this. These HN threads are filled with despair over the poor quality of vibe coding. At the same time, Anthropic is successfully coding Claude using Claude.

discuss

nemo1618|22 days ago

Centaurs are a transient phenomenon. In chess, the era of centaur supremacy lasted only about a decade before computers alone eclipsed human+computer. The same will be true in every other discipline.

You can surf the wave, but sooner or later, the wave will come crashing down.

pegasus|22 days ago

They are transient only in those rare domains that can be fully formalized/specified. Like chess. Anything that depends on the messy world of human - world interactions will require humans in the loop for translation and verification purposes.

Centigonal|22 days ago

How transient depends on the problem space. In chess, centaurs were transient. In architecture or CAD, they have been the norm for decades.

lanyard-textile|22 days ago

Agreed. But I don't think the time scale will be similar.

Chess is relatively simple in comparison, as complex as it is.

noosphr|22 days ago

Last I heard, which was last year, human + computer still beat either by themselves. You got a link about what's changed?

mw888|22 days ago

Why do you nitpick his illustrative example and entirely ignore his substantive one about finance?

eranation|22 days ago

I'm highly worried that you are right. But what gives me hope is that people still play chess, I'd argue even more than ever. People still buy paper books and vinyl records. People still appreciated handwritten greeting cards over printed ones, pay extra to listen to live music where the recorded one is free and will likely sound much better. People are willing to pay an order of magnitude more for a sit in a theater for a live play, or pay premium for handmade products over their almost impossible to distinguish knock offs.

wizzwizz4|22 days ago

That centaurs can outperform humans or AI systems alone is a weaker claim than "these particular AI systems have the required properties to be useful for that". Chess engines consistently produce strong lines, and can play entire games without human assistance: using one does not feel like gambling, even if occasionally you can spot a line it can't. LLMs catastrophically fail at iterated tasks unless they're closely supervised, and using LLMs does feel like gambling. I think you're overgeneralising.

There is definitely a gap in academic tooling, where an "association engine" would be very useful for a variety of fields (and for encouraging cross-pollination of ideas between fields), but I don't think LLMs are anywhere near the frontier of what can be accomplished with a given amount of computing power. I would expect simpler algorithms operating over more explicit ontologies to be much more useful. (The main issue is that people haven't made those yet, whereas people have made LLMs.) That said, there's still a lot of credit due to the unreasonable effectiveness of literature searches: it only usually takes me 10 minutes a day for a couple of days to find the appropriate jargon, at which point I gain access to more papers than I know what to do with. LLM sessions that substitute for literature review tend to take more than 20 minutes: the main advantage is that people actually engage with (addictive, gambling-like) LLMs in a way that they don't with (boring, database-like) literature searches.

I think developing the habit of "I'm at a loose end, so I'll idly type queries into my literature search engine" would produce much better outcomes than developing the habit of "I'm at a loose end, so I'll idly type queries into ChatGPT", and that's despite the state-of-the-art of literature search engines being extremely naïve, compared to what we can accomplish with modern technology.

Syzygies|22 days ago

We're in agreement. I understand how much harder it is to "think with AI"; the last year of my life has been a brutal struggle to figure this out.

I also agree that neural net LLMs are not the inevitable way to implement AI. I'm most intrigued by the theoretical underpinnings of mathematical proof assistants such as Lean 4. Computer scientists understand the word problem for strings as undecidable. The word problem for typed trees with an intrinsic notion of induction is harder, but constructing proofs is finding paths in this tree space. Just as mechanical computers failed in base ten while at the same time Boole had already developed base two logic, I see these efforts merging. Neural nets struggle to simulate recursion; for proof assistants recursion is baked in. Stare at these tree paths and one sees thought at the atomic level, begging to be incorporated into AI. For now the river runs the other way, using AI to find proofs. That river will reverse flow.

jmalicki|22 days ago

We have made those in the 80s. Much was learned about why probabilistic stochastic parrots are a far better model.

okintheory|22 days ago

I think you're misunderstanding the point this paper is trying to make. They're interested in trying to distinguish whether AI is capable of solving new math problems or only capable of identifying existing solutions in the literature. Distinguishing these two is difficult, because self-contained math problems that are easy enough for LLMs to address (e.g. minor Erdos-problems) may have been solved already as subcomponents of other work, without this widely known. So when an AI makes progress on such an Erdos problem, we don't know if it had a new idea, or correctly identified an existing but obscure answer. This issue has been dogging the claims of AI solving Erdos problems.

Instead, here you get questions that extremely famous mathematicians (Hairer, Spielman) are telling you (a) are solvable in <5 pages (b) do not have known solutions in the literature. This means that solutions from AI to these problems would perhaps give a clearer signal on what AI is doing, when it works on research math.

Davidzheng|22 days ago

I find it unbelievable that this question can't be settled themselves without posting this simply by asking the AI enough novel questions. I myself have little doubt that at least they can solve some novel questions (of course similarity of proofs is a spectrum so it's hard to draw the line at how original they are)

acedTrex|22 days ago

> Anthropic is successfully coding Claude using Claude.

Claude is one of the buggiest pieces of shit I have ever used. They had to BUY the creators of bun to fix the damn thing. It is not a good example of your thesis.

nubg|22 days ago

You and the GP are conflating Claude, the company or its flagship model Claude Opus, with Claude Code, a state of the art coding assistant that has admittedly a slow and buggy React-based TUI (output quality is still very competitive)

cadamsdotcom|22 days ago

This is beautifully written, thank you for writing it.

Typing out solutions to problems was only part of the job description because there was no other way to code. Now we have a far better way.

direwolf20|22 days ago

> At the same time, Anthropic is successfully coding Claude using Claude.

Is that why everyone keeps complaining about the quality getting worse?

Insanity|22 days ago

I think that’s more about model performance degrading due to less computational resources being assigned to them over time.

wasabi991011|22 days ago

> I'm a mathematician relying heavily on AI as an association engine of massive scope, to organize and expand my thoughts.

Can you share more about your architecture & process? Also a researcher involved in math research (though not strictly speaking a mathematician, but I digress). I've often thought about using AI on my notes, but they are messy and even then I can't quite figure out what to ask: prioritization, connecting ideas, lit search, etc.

I'd love to hear what you do.

makoConstruct|22 days ago

You didn't need to make this claim about driving. Coding requires robust metacognition. Driving doesn't, it can be drilled repetitively, and it also benefits from having superhuman senses and instant reaction times. It's somewhat more amenable to AI.

mlmonkey|22 days ago

Very well written. Thank you for putting down your thoughts so succinctly; I'm often at a loss for words when I try to express the same thoughts in a coherent manner.

aspenmartin|22 days ago

What a beautifully articulated take!