top | item 46959643

(no title)

ikr678 | 19 days ago

Is there a big enough dataset of 'good' code to train from though?

discuss

rybosworld|19 days ago

I (and lots of people) used to think the models would run out of training data and it would halt progress.

They did run out of human-authored training data (depending on who you ask), in 2024/2025. And they still improve.

lelanthran|19 days ago

> They did run out of human-authored training data (depending on who you ask), in 2024/2025. And they still improve.

It seemed to me that improvements due to training (i.e. the model) in 2025 were marginal. The biggest gains were in structuring how the conversation with the LLM goes.

eqvinox|19 days ago

> And they still improve.

But what asymptote are they approaching? Average code? Good code? Great code?

jmalicki|19 days ago

They ran out of passively collected data. RLHF allows them to gather deeper more targeted data.

jmalicki|19 days ago

There is a lot of RLHF effort around this.

co_king_3|19 days ago

AHEM

Let me repeat myself.

I think it goes without saying that they will be writing "good code" in short time.