Full_Name: Mark Cave-Ayland
Version: 2.4.8cvs-RE24-2008-04-15
OS: RHEL4, x86
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (217.207.197.142)
Hi there,
In order to resolve issues experienced with syncrepl/glue on an existing
openldap-2.4.8 deployment (ITS#5430), we have been using a CVS checkout of
openldap RE24 branch taken from 2008-04-15 on one of our test systems.
Unfortunately, we are still seeing random segfaults occurring roughly once a day
which appear to point towards the syncprov overlay once again. At the moment, we
are having difficulty reproducing the fault under test conditions, but if
openldap is left running long enough then it is possible to obtain a core dump.
The issue is occurring with a server, pelican, which is configured using the
syncprov overlay to a number of subordinates for different parts of the tree.
The relevant log snippet follows:
Apr 28 12:18:32 pelican slapd[7688]: do_syncrep2:
cookie=rid=142,csn=20080428111855.697316Z#000000#000#000000
Apr 28 12:18:32 pelican slapd[7688]: syncrepl_entry: rid=142
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_MODIFY)
Apr 28 12:18:32 pelican slapd[7688]: syncrepl_entry: rid=142 be_search (0)
Apr 28 12:18:32 pelican slapd[7688]: syncrepl_entry: rid=142
uid=richf,ou=V,ou=W,ou=X,dc=Y,dc=Z
Apr 28 12:18:32 pelican slapd[7688]: slap_queue_csn: queing 0x9ff7560
20080428111855.697316Z#000000#000#000000
Apr 28 12:18:32 pelican slapd[7688]: syncprov_sendresp:
cookie=rid=146,csn=20080428111855.697316Z#000000#000#000000
Apr 28 12:18:32 pelican slapd[7688]: syncprov_sendresp:
cookie=rid=134,csn=20080428111855.697316Z#000000#000#000000
Apr 28 12:18:32 pelican slapd[7688]: slap_graduate_commit_csn: removing
0xa12ee70 20080428111855.697316Z#000000#000#000000
Apr 28 12:18:32 pelican slapd[7688]: syncrepl_entry: rid=142 be_modify (0)
Apr 28 12:18:32 pelican slapd[7688]: slap_queue_csn: queing 0x9ff7560
20080428111855.697316Z#000000#000#000000
The backtrace obtained from the core file looks like this:
Loaded symbols for /usr/lib/sasl2/libdigestmd5.so.2
Reading symbols from /usr/lib/openldap/syncprov-2.4.so.2...Reading symbols from
/usr/lib/debug/usr/lib/openldap/syncprov-2.4.so.2.0.4.debug...done.
done.
Loaded symbols for /usr/lib/openldap/syncprov-2.4.so.2
#0 0x080e6638 in overlay_entry_get_ov (op=0x7e3eefd0, dn=0x7e3eeeb0, oc=0x0,
ad=0x0, rw=0, e=0x7e3eedfc, on=0x808bdf8) at
../../../servers/slapd/backover.c:355
355 rc = on->on_bi.bi_entry_get_rw( op, dn,
(gdb) bt
#0 0x080e6638 in overlay_entry_get_ov (op=0x7e3eefd0, dn=0x7e3eeeb0, oc=0x0,
ad=0x0, rw=0, e=0x7e3eedfc, on=0x808bdf8) at
../../../servers/slapd/backover.c:355
#1 0x00b187ac in syncprov_qtask (ctx=0x7e3ef2a0, arg=0xa02f708) at
../../../../servers/slapd/overlays/syncprov.c:871
#2 0x0817a277 in ldap_int_thread_pool_wrapper (xpool=0x9db94d0) at
../../../libraries/libldap_r/tpool.c:663
#3 0x00acb371 in start_thread () from /lib/tls/libpthread.so.0
#4 0x00944ffe in clone () from /lib/tls/libc.so.6
(gdb)
The server pelican is configured using both the syncprov & glue overlays, while
the subordinate for ou=V,ou=W,ou=X,dc=Y,dc=Z is a simple syncrepl declaration of
type refreshAndPersist.
Looking at the log snippet above, I can see in the "syncprov_sendresp" lines
that the cookie appears to be empty. This does appear to be similar to ITS#5432,
although this claims to have been fixed by a commit on the 21st March (and hence
the fix would be included within our CVS checkout). Further information can be
provided on request.
Many thanks,
Mark.
rein(a)tollevik.no wrote:
> On Wed, 9 Apr 2008, h.b.furuseth(a)usit.uio.no wrote:
>
>> Does this help? From my fiddling with ITS#5340 (REP_ENTRY_MODIFIABLE).
>> I do not understand syncprov's handling of REP_ENTRY_MUSTRELEASE though.
>> (For one thing it seems to assume that REP_ENTRY_MUSTRELEASE is set if
>> and only if rs.sr_entry->e_private != NULL. Which is possibly true with
>> back-bdb but seems a shaky assumption in general.)
>
> Almost, except that it triggers an abort.. Using be_entry_release_r() was
> not correct, as it ended up calling entry_free(). It works if I changes
> it to use overlay_entry_release_ov() instead, as in the alternative patch
> at the end.
>
> I have put a test script that shows this deadlock on:
>
> ftp://ftp.openldap.org/incoming/test053-syncprov-glue
>
> It deadlocks without this patch, it and the rest of the test suite succeed
> with it :-) I haven't tested it in production though..
This last patch is now committed to HEAD. I am ignoring the first patch; it is
unsafe because it unlocks the si_csn_rwlock too soon. (si_ctxcsn is being used
directly in the modify request and must be held consistent until that completes.)
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
rein(a)basefarm.no wrote:
> Full_Name: Rein Tollevik
> Version: CVS head
> OS:
> URL:
> Submission from: (NULL) (81.93.160.250)
>
>
> We have seen occasional seg. faults in syncprov_qtask() where it was
> passed a syncops pointer containing garbage in its arg. It looks as
> this could happen if syncprov_free_syncop is called to free an abandoned
> operation. I hope the patch at the should fix this, it makes sure to
> remove the syncops->s_qtask (if any) from the runqueue before freeing
> the syncops itself.
I think this could also cause ITS#5452.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Pierangelo Masarati <ando(a)sys-net.it> wrote:
> > While I was here, I added myself to the aknowledgements in the man page.
> Applied to HEAD (with minor changes); please test.
A quick test suggests it works as intended. Any chance to have it in an
upcoming 2.4 release?
--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu(a)netbsd.org
Emmanuel Dreyfus wrote:
> Sorry. The new patch was uploaded in FTP:
> manu-20080426.patch
>
> While I was here, I added myself to the aknowledgements in the man page.
Applied to HEAD (with minor changes); please test.
p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
---------------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Email: pierangelo.masarati(a)sys-net.it
---------------------------------------
Pierangelo Masarati <ando(a)sys-net.it> wrote:
> I'm considering your patch for inclusion in HEAD code; I see that in
> your patch you didn't follow the IPR notice as indicated in the
> contributing guidelines. Please resubmit the patch adding at the top
> the notices illustrated in
> <http://www.openldap.org/devel/contributing.html#notice>:
>
> - notice of origin
> - rights statement
Sorry. The new patch was uploaded in FTP:
manu-20080426.patch
While I was here, I added myself to the aknowledgements in the man page.
--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu(a)netbsd.org
manu(a)netbsd.org wrote:
> URL: ftp://ftp.openldap.org/incoming/manu-20070412.patch
> This patch adds attribute remapping capability to slapo-dynlist.
Emmanuel,
I'm considering your patch for inclusion in HEAD code; I see that in
your patch you didn't follow the IPR notice as indicated in the
contributing guidelines. Please resubmit the patch adding at the top
the notices illustrated in
<http://www.openldap.org/devel/contributing.html#notice>:
- notice of origin
- rights statement
p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
---------------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Email: pierangelo.masarati(a)sys-net.it
---------------------------------------
The ITS is for tracking issues in OpenLDAP software. Nothing in this report
indicates any bug in OpenLDAP software. Most likely you've found a bug in MIT
Kerberos. I suggest you contact the MIT folks. This ITS will be closed.
brian.peters(a)analog.com wrote:
> Full_Name: Brian Peters
> Version: 2.3.39
> OS: SPARC Solaris 10
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (137.71.23.54)
>
>
> I am attempting to compile Samba using active directory. This entails using LDAP
> and Kerberos during set up. The native versions on Solaris 10 are insufficient
> due to a lack of ldap_initialize in those header files. I am using the most
> recent versions of both Openldap (2.3.39) and Kerberos (krb5-1.6.3). The error
> manifests itself when compiling Kerberos. Before we proceed any further let's
> first lay down the foundation of the data.
>
> Openldap was configured using the following:
>
> ./configure --prefix=/cadvault/gbocadsrv2/www/pkgs/3rdparty/openldap/openlda
> p-2.3.39 --without-bdb --disable-bdb --enable-null --disable-slapd
> --enable-shar
> ed --with-shared=yes
>
> CFLAGS -D_AVL_H
>
> Kerberos using:
>
> ./configure --prefix=/cadvault/gbocadsrv2/www/pkgs/3rdparty/kerberos/krb5-1.6.3
> --with-ldap --enable-shared
>
>
> All important environment variables have been set specifically for each
> configuration (CPPFLAGS,LDFLAGS,PATH,LD_LIBRARY_PATH,CFLAGS).
>
> I have confirmed in the config.log files that each is pulling the correct files
> during the attempted compile. The issue appears when running gmake. Below is the
> important portion it returns.
>
> ...
> In file included from kdb_ldap.c:37:
> kdb_ldap.h:315: error: conflicting types for 'ldap_initialize'
> /cadvault/gbocadsrv2/www/pkgs/3rdparty/openldap/openldap-2.3.39/include/ldap.h:1346:
> error: previous declaration of 'ldap_initialize' was here
> kdb_ldap.h:315: error: conflicting types for 'ldap_initialize'
> /cadvault/gbocadsrv2/www/pkgs/3rdparty/openldap/openldap-2.3.39/include/ldap.h:1346:
> error: previous declaration of 'ldap_initialize' was here
> kdb_ldap.c:491: warning: missing braces around initializer
> kdb_ldap.c:491: warning: (near initialization for
> `kldap_init_fn__once.once.o.__pthread_once_pad')
> kdb_ldap.c:500: warning: no previous prototype for 'kldap_ensure_initialized'
> gmake[2]: *** [kdb_ldap.so] Error 1
> gmake[2]: Leaving directory
> `/cadvault/gbocadsrv2/www/src/3rdparty/kerberos/krb5-1.6.3/src/plugins/kdb/ldap/libkdb_ldap'
> gmake[1]: *** [all-recurse] Error 1
> gmake[1]: Leaving directory
> `/cadvault/gbocadsrv2/www/src/3rdparty/kerberos/krb5-1.6.3/src/plugins/kdb/ldap'
> gmake: *** [all-recurse] Error 1
>
>
> At this point I am at a bit of a loss on how to proceed. Any advice on a fix or
> versions of each to roll back to would be greatly appreciated.
>
>
>
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/