Encoder-Only vs Decoder-Only vs Encoder-Decoder Transformer

1 · Vaclav Kosar · Oct. 29, 2023, midnight
People keep asking me about, what is the difference between encoder, decoder, and normal transformer (with self-attention). It is a simple thing, you can master quickly. Encoder-only (BERT) BERT has Encoder-only architecture. Input is text and output is sequence of embeddings. Use cases are sequence classification (class token), token classification. It uses bidirectional attention, so the model can see forwards and backwards. Decoder-only (GPT4) GPT-2 has Decoder-only architecture. Input is t...