Howard Chu writes:
I didn't really spend a lot of time comparing the two functions' speed. But even with the memory access bottleneck, I would guess that on a loaded system with many threads running, the algorithm with fewer instructions is the better choice. Have you measured the throughput when multiple threads are executing?
Good point. I just did a quick single-threaded test program.