Howard Chu wrote:
What's also interesting is that for Hoard, Umem, and Tcmalloc, the multi-threaded query times are consistently about 2x slower than the single-threaded case. The 2x slowdown makes sense since it's only a dual-core CPU and it's doing 4x as much work. This kinda says that the cost of malloc is overshadowed by the overhead of thread scheduling.
Is it possible that the block stride in the addresses returned by malloc() is affecting cache performance in the glibc case ? If they are too close I think it is possible to thrash cache lines between cores.