First Token Cutoff LLM sampling

3 · Salvatore Sanfilippo · Jan. 12, 2024, 5:32 p.m.
From a theoretical standpoint, the best reply provided by an LLM is obtained by always picking the token associated with the highest probability. This approach makes the LLM output deterministic, which is not a good property for a number of applications. For this reason, in order to balance LLMs creativity while preserving adherence to the context, different sampling algorithms have been proposed in recent years. Today one of the most used ones, more or less the default, is called top-p: it is ...