(ITS#6437) sl_malloc issues - openldap-bugs

1 Jan 2010


      Full_Name: Hallvard B Furuseth
Version: HEAD
OS: Linux x86_64
URL: 
Submission from: (NULL) (129.240.6.233)
Submitted by: hallvard
Here are a number of bugs and other issues with slapd/sl_malloc.c, and
one for ber_memcalloc.  Some for RE24, likely some for RE25, but I don't
know where to cut off for RE24?  (Waiting for someone to say...)
* It claims to align on 2*sizeof(int) boundaries, but it looks like
  stack:realloc and pool:mem_create can un-align to (2n+1)*sizeof(int)
  boundaries on 32-bit hosts, where sizeof(ber_len_t) == sizeof(int).
A slap_sl_realloc() fix for the stack implementation is to replace
    size += pad + sizeof( ber_len_t );
    size &= ~pad;
  with
    size = ((size + pad) & ~pad) + sizeof( ber_len_t );
  however see the optimizations and tweaks below.
For the pool implementation, maybe slap_sl_mem_create() can just set
    so->so_ptr = (char *) sh->sh_base + sizeof(ber_len_t) % (pad+1);
  Unless that messes up its expectations of how much space is left, I
  haven't checked.  Should mem_create size be increased accordingly?
Unfortunately sl_malloc.c exports its internals, so I don't know what
  kind of fix would affect it:
* Why is there a slap_sl_mem_detach() when there is no way to reattach
  it?  As far as I can tell all the caller can do is to inspect/clean up
  the returned slab_heap itself.
* slap_sl_mem_create() switching implementation stack<->pool on an old
  memctx slab can crash or leak, it runs the wrong cleanup code.
Fix: Remove code duplication, rearrange.  The pool cleanup duplicates
  slap_sl_mem_destroy(), except the final frees.  Let destroy(key=NULL,)
  mean don't free and call that.  Also the create/destroy pool branches
  duplicate the stack branches.
* However, will the pool (i.e. non-stack) implementation ever be used?
  If not, a simpler fix to some of this is to delete it.  OTOH if it
  someday may be used, hopefully some of these cleanups will be leaving
  it a bit more correct, easier to read, and faster.  Though I haven't
  gone to the drastic step of studying it to see how it works.
Or we could #ifdef it out by default.  It's unclear to me why this is
  a run-time argument to the create function instead of instead of a
  -D<compiler flag>, why would the caller care?
#define SLAP_SLAB_MODE <undef or 1:stack, 0:pool: -1:support both>.
  #define SLAP_SLAB_ISSTACK(stack) <stack if both, else SLAP_SLAB_MODE>
* Regarding code duplication, the #ifdef SLAP_NO_SL_MALLOC code
  duplicates the !ctx code.  Make that if(<enum No_sl_malloc> && !ctx)
  and the #ifdefs can be killed.
* Also we can throw the #ifdef NO_THREADS/#else/#endif code out to two
  macros #define GET_MEMCTX()/SET_MEMCTX().
* The "enough room?" test (ptr + size < endptr) is invalid, can e.g.
  wrap around, when there is not enough room.  Use (size < endptr-ptr).
* Some failures assert(0)/exit() like ch_malloc.c, but without giving a
  Debug() message first.
* The slap_sl_malloc() pool implementation can return NULL on failure
  at comment "FIXME: missing return; guessing...".  For consistency it
  should exit on failure like the rest of sl_malloc.c and ch_malloc.c.
* Realloc(last block) when the following free space is not big enough,
  does not move sh_last and reclaim free space properly.
* slap_sl_calloc()/ber_memcalloc() do not catch <count>*<size> overflow:
    ber_len_t total = n * size;
    if (!n && total/n == size) <allocate...>;
That can be optimized a bit, if anyone cares:
#define LIM_SQRT(t) /* some value < sqrt(max value of unsigned type t) */ \
    ((0UL|(t)-1) >>31>>31 > 1 ? ((t)1 <<32) - 1 : \
     (0UL|(t)-1) >>31 ? 65535U : (0UL|(t)-1) >>15 ? 255U : 15U)
  ...
  /* sqrt test = slight optimization: often avoids the division */
  if ((n | size) <= LIM_SQRT(ber_len_t) || !n || total/n == size) ...
* slap_sl_<malloc/realloc> increase size before proceeding, they should
  reduce it before falling back to ch_malloc and for Debug messages.
Space optimizations/cleanup:
* To save space on 64-bit hosts: We can store store sizes (head, tail)
  in 4-byte typedef ber_uint_t slap_sl_size_t instead of a ber_len_t.
After fixing the misalignment issue above, however.
* Also the stack implementation does not need to allocate the tail.
  It can store the "block is freed" bit in the head of the _next_ block,
  and only store a tail in free blocks (i.e. if that bit is set).
I think that does add a bit of code, though it is convenient for the
  "reclaim freed blocks at sh_last" code in slap_sl_free().
* Ignoring last point, the "reclaim" code can in any case be simplified:
        /* mark it free */
        p[-1] = size |= 1;
        /* reclaim free space off tail */
        if (sh->sh_last == p) {
            do {
                p = (slap_sllen_t*) ((char*) p - size + 1) - 1;
                size = p[-1];
            } while (size & 1);
            sh->sh_last = p;
        }
* Realloc can check if previous or next block is free before giving up
  and doing malloc/copy/free.  That does multiply its code size though.
  Dunno if it's a good idea.  Needs testing/timing, I guess.
* Aligning with (size & ~pad) is wrong for non-two's complement signed
  pad, it should be (size & -(pad + 1)) aka (size & -Alignment).
All right, so I'm nitpicking.  I just dislike unportable code which
  isn't even simpler than the portable variant.
* pad and order_start can be compile-time enums instead of computed at
  run time (if any compilers do not optimize it away already):
  enum {
    Align = 2*sizeof(int),
    Align_log2 = 1 + (Align>2) + (Align>4) + (Align>8) + (Align>16),
    pad = Align - 1, /* but drop that, it's just as easy to use Align */
    order_start = Align_log2 /* which also can be dropped */
  };
  ...
  assert(Align == 1 << Align_log2);
  /* Adding head+tail preserves alignment */
  assert(2*sizeof(ber_len_t) % Align == 0);
* The pool code can be cleaned up by basing 'order' on Align-sized
  chunks instead of byte-sized chunks, i.e. subtract order_start from
  sh->sh_maxorder, order/i/j variables.  Also subtract 1 from
  mem_create:order, so that 'order' means the same as 'order' elsewhere.
  (Straightforward replacement, no need to invoke the brain and
  understand the code:-)
* The pool implementation sometimes does 1<<foo where it should do
  (ber_len_t)1<<foo or 1U<<foo.