(WIP) Notes on "The Illustrated GPT-2"

1 · Shane Mulligan · Oct. 23, 2019, 4 p.m.
Original article The Illustrated GPT-2 (Visualizing Transformer Language Models) Jay Alammar Visualizing machine learning one concept at a time Prereading Overview of The Illustrated Transformer // Bodacious Blog Parameters When an article talks about the number of parameters, this is what it’s referring to. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Parameters Single Transformer block Conv1d attn/c_attn w 768 2304 1769472 b 2304 2304 attn/c_proj w 768 768 589824 b 768 768 mlp/c_fc ...