top | item 46445923

(no title)

larrydag | 2 months ago

perhaps I'm missing something. Why not start the learning at a later state?

discuss

If the goal is to achieve end-to-end learning that would be cheating.

If you sat down to solve a problem you’ve never seen before you wouldn’t even know what a valid “later state” looking like.

taeric|2 months ago

Why is it cheating? We literally teach sports this way? Often times you teach sports by learning in scaled down scenarios. I see no reason this should be different.

bob1029|2 months ago

That's effectively what you get in either case. With MLM, on the first learning iteration you might only mask exactly one token per sequence. This is equivalent to starting learning at a later state. The direction of the curriculum flows toward more and more of these being masked over time, which is equivalent to starting from earlier and earlier states. Eventually, you mask 100% of the sequence and you are starting from zero.