We introduced entry_alloc/entry_free and attr_alloc/attr_free to avoid the severe heap fragmentation problems we were encountering with glibc malloc. However the current implementation is pretty suboptimal, using a global mutex for the entry and alloc free lists. This scales very poorly on multiprocessor machines.
The obvious fix is to adopt the same strategies that tcmalloc uses. (And unfortunately we can't simply rely on tcmalloc always being available, or always being stable in a given environment.) I.e., use per-thread cached free lists. We maintain some small number of free objects per thread; this per-thread free list can be used without locking. When the number of free objects on a given thread exceeds a particular threshold then we obtain the global lock to return some number of objects to the global list.
In practice this threshold can be very small - any given thread typically needs no more than 4 entries at a time. (ModDN is the worst case at 3 entries locked at once. LDAP TXNs would distort this figure but not in any critical fashion.) For attributes the typical usage is much more variable, but any number we pick will be an improvement over the current code.