top | item 36376315

(no title)

oneseven | 2 years ago

It seems like learned positional encodings would still prevent you from doing fine tuning on a larger context size, though, so maybe using alibi is still relevant (although I have not read that paper).

discuss

order

jimsimmons|2 years ago

You can collapse all positions beyond a length to a specific bucket like T5