top | item 22197774

(no title)

octbash | 6 years ago

How are you using GPT-2 with an expanded context window? I was under the impression that the maximum context window was fixed.

discuss

sillysaurusx|6 years ago

I wrote code to repeat the wpe variable N times along the context axis during model load time.

Specifically, the code checks whether the model's shape is greater than the shape from the snapshot on disk. If so, it repeats the shape from the snapshot on disk N times to fill the expected greater shape.

At that point, you can just set context window to a larger value, then train.

octbash|6 years ago

Is that essentially repeating the position embedding? I'm surprised that works, since the model should have no way to distinguish between the (e.g.) 1st and 513th token. (If I'm understanding this correctly.)