Full_Name: Hallvard B Furuseth Version: HEAD OS: Linux x86_64 URL: Submission from: (NULL) (129.240.6.233) Submitted by: hallvard
Here are a number of bugs and other issues with slapd/sl_malloc.c, and one for ber_memcalloc. Some for RE24, likely some for RE25, but I don't know where to cut off for RE24? (Waiting for someone to say...)
* It claims to align on 2*sizeof(int) boundaries, but it looks like stack:realloc and pool:mem_create can un-align to (2n+1)*sizeof(int) boundaries on 32-bit hosts, where sizeof(ber_len_t) == sizeof(int).
A slap_sl_realloc() fix for the stack implementation is to replace size += pad + sizeof( ber_len_t ); size &= ~pad; with size = ((size + pad) & ~pad) + sizeof( ber_len_t ); however see the optimizations and tweaks below.
For the pool implementation, maybe slap_sl_mem_create() can just set so->so_ptr = (char *) sh->sh_base + sizeof(ber_len_t) % (pad+1); Unless that messes up its expectations of how much space is left, I haven't checked. Should mem_create size be increased accordingly?
Unfortunately sl_malloc.c exports its internals, so I don't know what kind of fix would affect it:
* Why is there a slap_sl_mem_detach() when there is no way to reattach it? As far as I can tell all the caller can do is to inspect/clean up the returned slab_heap itself.
* slap_sl_mem_create() switching implementation stack<->pool on an old memctx slab can crash or leak, it runs the wrong cleanup code.
Fix: Remove code duplication, rearrange. The pool cleanup duplicates slap_sl_mem_destroy(), except the final frees. Let destroy(key=NULL,) mean don't free and call that. Also the create/destroy pool branches duplicate the stack branches.
* However, will the pool (i.e. non-stack) implementation ever be used? If not, a simpler fix to some of this is to delete it. OTOH if it someday may be used, hopefully some of these cleanups will be leaving it a bit more correct, easier to read, and faster. Though I haven't gone to the drastic step of studying it to see how it works.
Or we could #ifdef it out by default. It's unclear to me why this is a run-time argument to the create function instead of instead of a -D<compiler flag>, why would the caller care?
#define SLAP_SLAB_MODE <undef or 1:stack, 0:pool: -1:support both>. #define SLAP_SLAB_ISSTACK(stack) <stack if both, else SLAP_SLAB_MODE>
* Regarding code duplication, the #ifdef SLAP_NO_SL_MALLOC code duplicates the !ctx code. Make that if(<enum No_sl_malloc> && !ctx) and the #ifdefs can be killed.
* Also we can throw the #ifdef NO_THREADS/#else/#endif code out to two macros #define GET_MEMCTX()/SET_MEMCTX().
* The "enough room?" test (ptr + size < endptr) is invalid, can e.g. wrap around, when there is not enough room. Use (size < endptr-ptr).
* Some failures assert(0)/exit() like ch_malloc.c, but without giving a Debug() message first.
* The slap_sl_malloc() pool implementation can return NULL on failure at comment "FIXME: missing return; guessing...". For consistency it should exit on failure like the rest of sl_malloc.c and ch_malloc.c.
* Realloc(last block) when the following free space is not big enough, does not move sh_last and reclaim free space properly.
* slap_sl_calloc()/ber_memcalloc() do not catch <count>*<size> overflow: ber_len_t total = n * size; if (!n && total/n == size) <allocate...>;
That can be optimized a bit, if anyone cares:
#define LIM_SQRT(t) /* some value < sqrt(max value of unsigned type t) */ \ ((0UL|(t)-1) >>31>>31 > 1 ? ((t)1 <<32) - 1 : \ (0UL|(t)-1) >>31 ? 65535U : (0UL|(t)-1) >>15 ? 255U : 15U) ... /* sqrt test = slight optimization: often avoids the division */ if ((n | size) <= LIM_SQRT(ber_len_t) || !n || total/n == size) ...
* slap_sl_<malloc/realloc> increase size before proceeding, they should reduce it before falling back to ch_malloc and for Debug messages.
Space optimizations/cleanup:
* To save space on 64-bit hosts: We can store store sizes (head, tail) in 4-byte typedef ber_uint_t slap_sl_size_t instead of a ber_len_t.
After fixing the misalignment issue above, however.
* Also the stack implementation does not need to allocate the tail. It can store the "block is freed" bit in the head of the _next_ block, and only store a tail in free blocks (i.e. if that bit is set).
I think that does add a bit of code, though it is convenient for the "reclaim freed blocks at sh_last" code in slap_sl_free().
* Ignoring last point, the "reclaim" code can in any case be simplified: /* mark it free */ p[-1] = size |= 1; /* reclaim free space off tail */ if (sh->sh_last == p) { do { p = (slap_sllen_t*) ((char*) p - size + 1) - 1; size = p[-1]; } while (size & 1); sh->sh_last = p; }
* Realloc can check if previous or next block is free before giving up and doing malloc/copy/free. That does multiply its code size though. Dunno if it's a good idea. Needs testing/timing, I guess.
* Aligning with (size & ~pad) is wrong for non-two's complement signed pad, it should be (size & -(pad + 1)) aka (size & -Alignment).
All right, so I'm nitpicking. I just dislike unportable code which isn't even simpler than the portable variant.
* pad and order_start can be compile-time enums instead of computed at run time (if any compilers do not optimize it away already): enum { Align = 2*sizeof(int), Align_log2 = 1 + (Align>2) + (Align>4) + (Align>8) + (Align>16), pad = Align - 1, /* but drop that, it's just as easy to use Align */ order_start = Align_log2 /* which also can be dropped */ }; ... assert(Align == 1 << Align_log2); /* Adding head+tail preserves alignment */ assert(2*sizeof(ber_len_t) % Align == 0);
* The pool code can be cleaned up by basing 'order' on Align-sized chunks instead of byte-sized chunks, i.e. subtract order_start from sh->sh_maxorder, order/i/j variables. Also subtract 1 from mem_create:order, so that 'order' means the same as 'order' elsewhere. (Straightforward replacement, no need to invoke the brain and understand the code:-)
* The pool implementation sometimes does 1<<foo where it should do (ber_len_t)1<<foo or 1U<<foo.