top | item 38092534

(no title)

arugulum | 2 years ago

> the RoPE embeddings in Code Llama were designed for this.

The RoPE embeddings were not "designed" for that. The original RoPE was not designed with length extrapolation in mind. Subsequent tweaks to extrapolate RoPE (e.g. position interpolation) are post-hoc tweaks (with optional tuning) to an entirely vanilla RoPE implementation.

discuss

order

No comments yet.