Attention Is Off By One

1 · Evan Miller · Sept. 9, 2022, 4:05 p.m.
Transformer has a mathematical bug that has been overlooked for 6+ years. I propose fixing its outliers with two new devices, Softmax One and QuietAttention: Attention Is Off By One...