top | item 41346347

(no title)

tc4v | 1 year ago

This seems unlikely because LLMs don't produce high quality code, they produce average code. So they don't contribute to a better dataset, they contribute to a narrower dataset around the average. LLM tend to self poison, not to self improve. There is a good chance it already started because of the huge amount of chatgpt code that was put on github since 2021. Maybe if the LLM authors use some quality filter to discard 80%of the dataset it can be avoided.

discuss

No comments yet.