Fast CSV processing with SIMD

109 · Christopher Wellons · Dec. 4, 2021, 1:42 a.m.
I recently learned of csvquote, a tool that encodes troublesome CSV characters such that unix tools can correctly process them. It reverses the encoding at the end of the pipeline, recovering the original input. The original implementation handles CSV quotes using the straightforward, naive method. However, there’s a better approach that is not only simpler, but around 3x faster on modern hardware. Even more, there’s yet another approach using SIMD intrinsics, plus some bit twiddling tricks, whi...