For a future version 2.x or maybe 3.x but hopefully sooner:
The original idea behind sl_malloc / op->o_tmpalloc was to have per-operation memory allocation that never needed an explicit free(), the memory would simply be discarded/reset when the operation finished. This idea has been subverted over time, and the code is now littered with ch_free/tmpfree everywhere, which is exactly what sl_malloc was supposed to eliminate.
There was one key problem with the original sl_malloc idea, it only accounted for two types of memory but in practice we really have three: global memory, whose allocations must persist beyond the life of a single operation, per-operation memory, and actual scratch/temporary memory. In a future version I'd like to add an opalloc() function for the per-operation memory.
Rationale: most global allocations occur at startup time, processing the config. Generally this stuff never needed explicit freeing because it only went away at shutdown time, but now that we have runtime config with delete support we need to handle that too. The other obvious case is for per-connection state, such as established after a Bind op. Back when we still used BerkeleyDB backends, the backend's various caches would also need global memory. All of these would be allocated using ch_malloc.
The per-operation memory is primarily the per-operation ACL cache. The other case that makes sense would be to use it for all per-op callback structures. Overhauling overlays to only use opalloc() for these (instead of the stack, which is frequently being used now) would allow many overlays to work correctly with asynchronous backends.
The scratch memory usage remains the most frequently used, typically for DN/attribute normalization, entry construction, etc. For LDAP operations that only affect a single entry, like every operation besides Search, there usually wouldn't be much difference in memory lifetime between opalloc and tmpalloc memory. But for Search, the normal use pattern would be to do a sl_mark() before constructing a search response, send the response, then do an sl_release() before constructing the next response, and so on.
###
Another item to overhaul would be the use of op->o_bd->bd_info for invoking backend/overlay functions. Currently we create an entire dummy copy of the original op->o_bd so we can override the bd_info as we walk thru the overlays. That has caused the need for a few other ridiculous things (like bd_self to point back to the real backend structure). We should have just added a new op->o_bdinfo pointer to the Operation struct and left the backend structure alone. This will reduce a bit of pointless memory copying and speed up overlay processing overall.