Howard Chu writes:
The obvious fix is to adopt the same strategies that tcmalloc uses.
(And
unfortunately we can't simply rely on tcmalloc always being available, or
always being stable in a given environment.)
Good, though I'd like to see these slapd re-implementations of system
features (like malloc) #ifdeffed with a fallback to the system feature.
Then one can compile with -D<revert to system feature> either when that
one is as good or better than slapd's, or to simplify debugging.
Configure can guess about it too, e.g. it can detect tcmalloc.
The new entry_free() plus tcmalloc may be better than plain tcmalloc,
I don't know. It retains the global mutex though, which presumably is
or someday will be a pessimization compared to _some_ malloc out there.
I.e., use per-thread cached free
lists. We maintain some small number of free objects per thread; this
per-thread free list can be used without locking. When the number of free
objects on a given thread exceeds a particular threshold
...or there is no thread key for the mutex (e.g. when the current
thread is not from the thread pool)...
Might be convenient to let slapd register init-thread and cleanup-thread
functions in the thread pool. These could create/destroy these mutexes,
and maybe some other per-thread slapd variables too.
(Preferably the init function would be able to fail and cause the pool
thread to die, but that'd mess up the pool logic which assumes once a
thread has been created it will be able to handle submitted tasks.
Except slapd often doesn't check for malloc/mutex_init success anyway,
so demanding success would be no worse than what slapd does now.)
then we obtain the
global lock to return some number of objects to the global list.
In practice this threshold can be very small - any given thread typically
needs no more than 4 entries at a time. (ModDN is the worst case at 3 entries
locked at once. LDAP TXNs would distort this figure but not in any critical
fashion.) For attributes the typical usage is much more variable, but any
number we pick will be an improvement over the current code.
Add a few more for overlays, in particular syncrepl. Otherwise even a
single overlay doing entry_dup() reduces performance.
--
Hallvard