Metrics on Words

1 · Jeremy Kun · Dec. 19, 2011, 8:59 p.m.
Summary
We are about to begin a series where we analyze large corpora of English words. In particular, we will use a probabilistic analysis of Google’s ngrams to solve various tasks such as spelling correction, word segmentation, on-line typing prediction, and decoding substitution ciphers. This will hopefully take us on a wonderful journey through elementary probability, dynamic programming algorithms, and optimization. As usual, the code implemented in this post is available from this blog’s Github pa...