Benchmarking division and libdivide on Apple M1 and Intel AVX512

521 · · May 12, 2021, 6:45 p.m.
libdivide is fish’s library for speeding up integer division by runtime constants. It works by replacing divide instructions with cheaper multiplications and shifts. libdivide supports SIMD, and recently gained support for AVX-512 and ARM NEON. Let’s benchmark on an Intel Xeon and Apple’s M1. Test Setup The microbenchmark is a sum-of-quotients function, like: uint32_t sumq(const uint32_t *vals, size_t count, uint32_t d) { uint32_t sum = 0; for (size_t i=0; i < count; i++) sum += ...