The largest independent dev blog feed.

We surface the best developer writing from thousands of independent blogs, updated daily. The open web is worth fighting for.

Learn more

Metrics on Words

1 · Jeremy Kun · Dec. 19, 2011, 8:59 p.m.

Summary

We are about to begin a series where we analyze large corpora of English words. In particular, we will use a probabilistic analysis of Google’s ngrams to solve various tasks such as spelling correction, word segmentation, on-line typing prediction, and decoding substitution ciphers. This will hopefully take us on a wonderful journey through elementary probability, dynamic programming algorithms, and optimization. As usual, the code implemented in this post is available from this blog’s Github pa...

Read full post on www.jeremykun.com →

AUTHOR