(ITS#7037) slapd crashes with syncrepl when replica cleared out DB - openldap-bugs

7 Sep 2011


      Full_Name: Claus Assmann
Version: 2.4.26
OS: Red Hat Enterprise Linux Server release 5.5
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (63.211.143.38)
While trying to determine how syncrepl behaves in various error
conditions, I encountered a crash of slapd.  A MASTER is set up to
use push replication to a REPLICA as follows:
database        ldap
hidden          on
suffix          ""
rootdn          "cn=slapd-ldap"
uri             ldap://REPLICA/
lastmod         on
restrict        all
sync_use_subentry       true
acl-bind        bindmethod=simple
                binddn="cn=Monitor"
                credentials=password
syncrepl        rid=001
                provider=ldapi://%2Fvar%2Frun%2Fldapi
                binddn="cn=Manager"
                bindmethod=simple
                credentials=passwd
                searchbase=""
                type=refreshAndPersist
                retry="5 5 60 +"
Reproduce:
On REPLICA: stop slapd, clear out directory for DB, start
slapd.
On MASTER:
add an entry master which has to be "synced" to REPLICA;
run ldapsearch to lookup up that entry on MASTER
slapd dumps core, backtrace:
#0  0x00002b7e51344265 in raise () from /lib64/libc.so.6
#1  0x00002b7e51345d10 in abort () from /lib64/libc.so.6
#2  0x00002b7e5133d6e6 in __assert_fail () from /lib64/libc.so.6
#3  0x0000000000549546 in ldap_add_ext (ld=0xbba1bc0, dn=0x0, attrs=0xbeb7ed0,
    sctrls=0x0, cctrls=0x0, msgidp=0x435224c4) at add.c:126
#4  0x00000000004dbe6d in ldap_back_add (op=0x43523150, rs=0x43522770)
    at add.c:102
#5  0x0000000000481152 in overlay_op_walk (op=0x43523150, rs=0x43522770,
    which=op_add, oi=0xb9eb6c0, on=0x0) at backover.c:671
#6  0x00000000004816a7 in over_op_func (op=0x43523150, rs=0x43522770,
    which=op_add) at backover.c:723
#7  0x0000000000473706 in syncrepl_add_glue_ancestors (op=0x43523150,
    e=0x2b7e56628fb8) at syncrepl.c:3149
#8  0x000000000047384e in syncrepl_add_glue (op=0x43523150, e=0x2b7e56628fb8)
    at syncrepl.c:3193
#9  0x0000000000474795 in syncrepl_entry (si=0xb9eb1c0, op=0x43523150,
    entry=0x0, modlist=0x43523ca0, syncstate=<value optimized out>,
    syncUUID=<value optimized out>, syncCSN=0xbeb86e0) at syncrepl.c:2448
#10 0x000000000047c39b in do_syncrep2 (ctx=<value optimized out>,
    arg=<value optimized out>) at syncrepl.c:982
#11 do_syncrepl (ctx=<value optimized out>, arg=<value optimized out>)
    at syncrepl.c:1489
#12 0x000000000041eee3 in connection_read_thread (ctx=0x43523da0,
    argv=<value optimized out>) at connection.c:1276
#13 0x00000000005412dc in ldap_int_thread_pool_wrapper (xpool=0xb8fa1c0)
    at tpool.c:685
#14 0x00002b7e510ff73d in start_thread () from /lib64/libpthread.so.0
#15 0x00002b7e513e84bd in clone () from /lib64/libc.so.6
Note: this also happens with 2.4.23. On a VM instance, it causes
a different error (see mail on list: "killed after 120 seconds")