(no title)
gejose | 1 month ago
> In 2029, AI will not be able to watch a movie and tell you accurately what is going on (what I called the comprehension challenge in The New Yorker, in 2014). Who are the characters? What are their conflicts and motivations? etc.
> In 2029, AI will not be able to read a novel and reliably answer questions about plot, character, conflicts, motivations, etc. Key will be going beyond the literal text, as Davis and I explain in Rebooting AI.
> In 2029, AI will not be able to work as a competent cook in an arbitrary kitchen (extending Steve Wozniak’s cup of coffee benchmark).
> In 2029, AI will not be able to reliably construct bug-free code of more than 10,000 lines from natural language specification or by interactions with a non-expert user. [Gluing together code from existing libraries doesn’t count.]
> In 2029, AI will not be able to take arbitrary proofs from the mathematical literature written in natural language and convert them into a symbolic form suitable for symbolic verification.
Many of these have already been achieved, and it's only early 2026.
[1]https://garymarcus.substack.com/p/dear-elon-musk-here-are-fi...
merlincorey|1 month ago
My understanding of the current scorecard is that he's still technically correct, though I agree with you there is velocity heading towards some of these things being proven wrong by 2029.
For example, in the recent thread about LLMs and solving an Erdos problem I remember reading in the comments that it was confirmed there were multiple LLMs involved as well as an expert mathematician who was deciding what context to shuttle between them and helping formulate things.
Similarly, I've not yet heard of any non-expert Software Engineers creating 10,000+ lines of non-glue code that is bug-free. Even expert Engineers at Cloud Flare failed to create a bug-free OAuth library with Claude at the helm because some things are just extremely difficult to create without bugs even with experts in the loop.
bspammer|1 month ago
The second claim about novels seems obviously achieved to me. I just pasted a random obscure novel from project gutenberg into a file and asked claude questions about the characters, and then asked about the motivations of a random side-character. It gave a good answer, I'd recommend trying it yourself.
stingrae|1 month ago
4 is close, the interface needs some work to allow nontechnical people use it. (claude code)
zozbot234|1 month ago
Can AI actually do this? This looks like a nice benchmark for complex language processing, since a complete novel takes up a whole lot of context (consider War and Peace or The Count of Monte Cristo). Of course the movie variety is even more challenging since it involves especially complex multi-modal input. You could easily extend it to making sense of a whole TV series.
idreyn|1 month ago
colechristensen|1 month ago
Yes, you just break the book down by chapters or whatever conveniently fits in the context window to produce summaries such that all of the chapter summaries can fit in one context window.
You could also do something with a multi-pass strategy where you come up with a collection of ideas on the first pass and then look back with search to refine and prove/disprove them.
Of course for novels which existed before the time of training an LLM will already contain trained information about so having it "read" classic works like The Count of Monte Cristo and answer questions about it would be a bit of an unfair pass of the test because models will be expected to have been trained on large volumes of existing text analysis on that book.
>reliably answer questions about plot, character, conflicts, motivations
LLMs can already do this automatically with my code in a sizable project (you know what I mean), it seems pretty simple to get them to do it with a book.
the-grump|1 month ago
Consider also that they can generate summaries and tackle the novel piecemeal, just like a human would.
Re: movies. Get YouTube premium and ask YouTube to summarize a 2hr video for you.
postalrat|1 month ago
ls612|1 month ago
colechristensen|1 month ago
The keyword being "reliably" and what your threshold is for that. And what "bug free" means. Groups of expert humans struggle to write 10k lines of "bug free" code in the absolutist sense of perfection, even code with formal proofs can have "bugs" if you consider the specification not matching the actual needs of reality.
All but the robotics one are demonstrable in 2026 at least.
staticman2|1 month ago
> In 2029, AI will not be able to read a novel and reliably answer questions about plot, character, conflicts, motivations, etc. Key will be going beyond the literal text, as Davis and I explain in Rebooting AI.
Once you are "going beyond the literal text" the standard is usefulness of your insight about the novel, not whether your insight is "right" or "wrong".
thethirdone|1 month ago
I think the arbitrary proofs from mathematical literature is probably the most solved one. Research into IMO problems, and Lean formalization work have been pretty successful.
Then, probably reading a novel and answering questions is the next most successful.
Reliably constructing 10k bug free lines is probably the least successful. AI tends to produce more bugs than human programmers and I have yet to meet a programmer who can reliably produce less than 1 bug per 10k lines.
zozbot234|1 month ago
kleene_op|1 month ago
You imperatively need to try Claude Code, because it absolutely does that.
dyauspitr|1 month ago
Just earlier today I asked it to give me a summary of a show I was watching until a particular episode in a particular season without spoiling the rest of it and it did a great job.
suddenlybananas|1 month ago
raincole|1 month ago
I'm quite sure people who made those (now laughable) predictions will tell you none of these has been achieved, because AI isn't doing this "reliably" or "bug-free."
Defending your predictions is like running an insurance company. You always win.
jgalt212|1 month ago
raincole|1 month ago
If Bill Gates made a predication about computing, no matter what the predication says, you can bet that 640K memory quote would be mentioned in the comment section (even he didn't actually say that).
GorbachevyChase|1 month ago
margalabargala|1 month ago