Full_Name: Mark Cave-Ayland Version: 2.4.8cvs-RE24-2008-04-15 OS: RHEL4, x86 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (217.207.197.142)
Hi there,
In order to resolve issues experienced with syncrepl/glue on an existing openldap-2.4.8 deployment (ITS#5430), we have been using a CVS checkout of openldap RE24 branch taken from 2008-04-15 on one of our test systems.
Unfortunately, we are still seeing random segfaults occurring roughly once a day which appear to point towards the syncprov overlay once again. At the moment, we are having difficulty reproducing the fault under test conditions, but if openldap is left running long enough then it is possible to obtain a core dump.
The issue is occurring with a server, pelican, which is configured using the syncprov overlay to a number of subordinates for different parts of the tree. The relevant log snippet follows:
Apr 28 12:18:32 pelican slapd[7688]: do_syncrep2: cookie=rid=142,csn=20080428111855.697316Z#000000#000#000000 Apr 28 12:18:32 pelican slapd[7688]: syncrepl_entry: rid=142 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_MODIFY) Apr 28 12:18:32 pelican slapd[7688]: syncrepl_entry: rid=142 be_search (0) Apr 28 12:18:32 pelican slapd[7688]: syncrepl_entry: rid=142 uid=richf,ou=V,ou=W,ou=X,dc=Y,dc=Z Apr 28 12:18:32 pelican slapd[7688]: slap_queue_csn: queing 0x9ff7560 20080428111855.697316Z#000000#000#000000 Apr 28 12:18:32 pelican slapd[7688]: syncprov_sendresp: cookie=rid=146,csn=20080428111855.697316Z#000000#000#000000 Apr 28 12:18:32 pelican slapd[7688]: syncprov_sendresp: cookie=rid=134,csn=20080428111855.697316Z#000000#000#000000 Apr 28 12:18:32 pelican slapd[7688]: slap_graduate_commit_csn: removing 0xa12ee70 20080428111855.697316Z#000000#000#000000 Apr 28 12:18:32 pelican slapd[7688]: syncrepl_entry: rid=142 be_modify (0) Apr 28 12:18:32 pelican slapd[7688]: slap_queue_csn: queing 0x9ff7560 20080428111855.697316Z#000000#000#000000
The backtrace obtained from the core file looks like this:
Loaded symbols for /usr/lib/sasl2/libdigestmd5.so.2 Reading symbols from /usr/lib/openldap/syncprov-2.4.so.2...Reading symbols from /usr/lib/debug/usr/lib/openldap/syncprov-2.4.so.2.0.4.debug...done. done. Loaded symbols for /usr/lib/openldap/syncprov-2.4.so.2 #0 0x080e6638 in overlay_entry_get_ov (op=0x7e3eefd0, dn=0x7e3eeeb0, oc=0x0, ad=0x0, rw=0, e=0x7e3eedfc, on=0x808bdf8) at ../../../servers/slapd/backover.c:355 355 rc = on->on_bi.bi_entry_get_rw( op, dn, (gdb) bt #0 0x080e6638 in overlay_entry_get_ov (op=0x7e3eefd0, dn=0x7e3eeeb0, oc=0x0, ad=0x0, rw=0, e=0x7e3eedfc, on=0x808bdf8) at ../../../servers/slapd/backover.c:355 #1 0x00b187ac in syncprov_qtask (ctx=0x7e3ef2a0, arg=0xa02f708) at ../../../../servers/slapd/overlays/syncprov.c:871 #2 0x0817a277 in ldap_int_thread_pool_wrapper (xpool=0x9db94d0) at ../../../libraries/libldap_r/tpool.c:663 #3 0x00acb371 in start_thread () from /lib/tls/libpthread.so.0 #4 0x00944ffe in clone () from /lib/tls/libc.so.6 (gdb)
The server pelican is configured using both the syncprov & glue overlays, while the subordinate for ou=V,ou=W,ou=X,dc=Y,dc=Z is a simple syncrepl declaration of type refreshAndPersist.
Looking at the log snippet above, I can see in the "syncprov_sendresp" lines that the cookie appears to be empty. This does appear to be similar to ITS#5432, although this claims to have been fixed by a commit on the 21st March (and hence the fix would be included within our CVS checkout). Further information can be provided on request.
Many thanks,
Mark.