top | item 46542864

(no title)

extesy | 1 month ago

Synthetic data. Like AlphaZero playing randomized games against itself, a future coding LLM would come up with new projects, or feature requests for existing projects, or common maintenance tasks for itself to execute. Its value function might include ease of maintainability, and it could run e2e project simulations to make sure it actually works.

discuss

rmunn|1 month ago

AlphaZero playing games against itself was useful because there's an objective measure of success in a game of Go: at the end of the game, did I have more points than my opponent? So you can "reward" the moves that do well, and "punish" the moves that do poorly. And that objective measure of success can be programmed into the self-training algorithm, so that it doesn't need human input in order to tell (correctly!) whether its model is improving or getting worse. Which means you can let it run in a self-feedback loop for long enough and it will get very good at winning.

What's the objective measure of success that can be programmed into the LLM to self-train without human input? (Narrowing our focus to only code for this question). Is it code that runs? Code that runs without bugs? Code without security holes? And most importantly, how can you write an automated system to verify that? I don't buy that E2E project simulations would work: it can simulate the results, but what results is it looking for? How will it decide? It's the evaluation, not the simulation, that's the inescapably hard part.

Because there's no good, objective way for the LLM to evaluate the results of its training in the case of code, self-training would not work nearly as well as it did for AlphaZero, which could objectively measure its own success.

falloutx|1 month ago

You dont need synthetic data, people are posting vibe coded projects on the github every day and they are being added to next model's training set. I expect in like 4-5 years, humans would just not be able to do things that are not in the training set. Anything novel or fun will be locked down to creative agencies and few holdouts who managed to survive.

chneu|1 month ago

Or it'll create an alternative reality where that AI iterates itself into delusion.