DIFF.BLOG
New
Following
Discover
Jobs
More
Suggest a blog
Upvotes plugin
Report bug
Contact
About
Sign up  
Attention Is Off By One
1
·
Evan Miller
·
Sept. 9, 2022, 4:05 p.m.
Summary
Transformer has a mathematical bug that has been overlooked for 6+ years. I propose fixing its outliers with two new devices, Softmax One and QuietAttention: Attention Is Off By One...
Read full post on www.evanmiller.org →
Submit
AUTHOR
RECENT POSTS FROM THE AUTHOR