Why could YaRN be currently the best method to expand the context length

1 · Norm Inui · Nov. 16, 2023, 8:44 a.m.
Summary
The “Out-of-Bound” Problem In YaRN paper, the author mentioned a flaw in current NTK-RoPE: Due to the “out-of-bound” values, the theoretical scale factor \(s\) does not accurately describe the true context extension scale. In practice, the scale value \(s\) has to be set higher than the expected scale for a given context length extension. To understand how the “out-of-bound” influences the extension scale, we first recall how NTK-aware interpolation works. For RoPE, the \(\omega = b^{-\frac{2...