(no title)
extesy
|
1 month ago
Synthetic data. Like AlphaZero playing randomized games against itself, a future coding LLM would come up with new projects, or feature requests for existing projects, or common maintenance tasks for itself to execute. Its value function might include ease of maintainability, and it could run e2e project simulations to make sure it actually works.
rmunn|1 month ago
What's the objective measure of success that can be programmed into the LLM to self-train without human input? (Narrowing our focus to only code for this question). Is it code that runs? Code that runs without bugs? Code without security holes? And most importantly, how can you write an automated system to verify that? I don't buy that E2E project simulations would work: it can simulate the results, but what results is it looking for? How will it decide? It's the evaluation, not the simulation, that's the inescapably hard part.
Because there's no good, objective way for the LLM to evaluate the results of its training in the case of code, self-training would not work nearly as well as it did for AlphaZero, which could objectively measure its own success.
falloutx|1 month ago
chneu|1 month ago