(no title)
isotypic | 1 year ago
But how does the student, or in your case the LLM, know that it actually has the solution? For students, this is done by: a grader grading the homework, asking the professor at OH, working on problems with other peers who crosscheck as you go. I see no reason why this LLM produced synthetic data, without this correction factor, would not devolve into a mess of incorrect, maybe even not-even-wrong style "proofs". And then how can training on this yield anything?
cevi|1 year ago
(In principle, it should be also be possible to get good enough at philosophy to avoid devolving into a mess of incoherence while reasoning about concepts like "knowledge", "consciousness", and "morality". I suspect some humans have achieved that, but it seems rather difficult to tell...)
isotypic|1 year ago
This is simply not true - you can get a very good sense of when your argument is correct, yes. But having graded for (graduate, even!) courses, even advanced students make mistakes. It's not limited to students, either; tons of textbooks have significant errata, and its not as if no retraction in math has ever been issued.
These get corrected by talking with other people - if you have an LLM spew out this synthetic chain-of-reasoning data, you probably get at least some wrong proofs, and if you blindly try to scale with this I would expect it to collapse.
Even tying into a proof-checker seems non-trivial to me. If you work purely in the proof-checker, you never say anything wrong - but the presentations in proof checking language is very different from textual ones, so I would anticipate issues of the LLM leveraging knowledge from, say, textbooks in its proofs. You might also run into issues of the AI playing a game against the compiler rather than building understanding (you see elements of this in the proofs produced by AlphaProof). And if you start mixing natural language and proof checkers, you've just kicked the verification can up the road a bit, since you need some way of ensuring the natural language actually matches the statements being shown by the proof checker.
I don't think these are insurmountable challenges, but I also don't think its as simple as the "generate synthetic data and scale harder" approach the parent comment thinks. Perhaps I'm wrong - time will tell.