Full_Name: Quanah Gibson-Mount
Version: 2.3.37
OS:
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (71.202.148.128)
After getting repeat DB corruptions, I finally tracked down the issue to this:
If DB_CONFIG has changed since the last time slapd was started, and slapindex -q
is run, the database ends up corrupt.
For example, I did the following with a perfectly happy slapd:
ldap stop
Killing slapd with pid 9857 done.
cd /opt/zimbra/openldap-data
touch DB_CONFIG
/opt/zimbra/openldap/sbin/slapindex -b '' -q -f /opt/zimbra/conf/slapd.conf
bdb_db_open: DB_CONFIG for suffix has changed.
Performing database recovery to activate new settings.
bdb(): DB_RECOVER and DB_RECOVER_FATAL require DB_TXN_INIT in DB_ENV->open
bdb(): PANIC: Invalid argument
bdb_db_open: Database cannot be recovered, err -30978. Restore from backup!
backend_startup_one: bi_db_open failed! (-30978)
slap_startup failed
One of the things this does, is break syncreplication, as it apparently clears
out the glue entry. :/
--Quanah
Full_Name: Quanah Gibson-Mount
Version: 2.3.37
OS:
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (71.202.148.128)
I noticed that when using the empty suffix, and DB_CONFIG is updated, this gets
logged:
bdb_db_open: DB_CONFIG for suffix has changed.
Note the extra space and lack of suffix.
ando(a)sys-net.it wrote:
> I've prepared a trivial client that ldap_sasl_bind_s(), holds on while I
> shut down the server, ldap_search_ext_s() with LDAP_SERVER_DOWN and
> ldap_unbind_ext(). Prior to patching, I always got SIGPIPE.
Let me add that the above patch does not work when using ldapi://; in
that case, the SIGPIPE occurs when first flushing a request on the
broken connection (in sb_stream_write()), while on regular INET sockets
the flush succeeds and the SIGPIPE prior to patching was returned much
later at unbind.
p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
---------------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Email: pierangelo.masarati(a)sys-net.it
---------------------------------------
ando(a)sys-net.it wrote:
> ando(a)sys-net.it wrote:
>> I've seen something similar in recent code. I was just tracking it
>> down, so you basically saved me the effort of opening a ticket :). What
>> I found so far is that when ldap_unbind(3) is called (which is required
>> to release resources after the connection broke), the client library
>> tries to send a LDAPUnbind request to the server, even though it just
>> got a LDAP_SERVER_DOWN (-1). The behavior seems to be more frequent
>> when the connection brakes while using ldapi://, and I couldn't spot the
>> difference up to now, I'm just mentioning it in case it rings any bells.
>
>
> What happens is that when try_read1msg() finds out the connection is
> broken, it sets ld_errno to LDAP_SERVER_DOWN, but leaves
> lc->lconn_status to LDAP_CONNST_CONNECTED, so a subsequent call to
> ldap_unbind_ext causes ldap_free_connection() to try sending the
> LDAPUnbind anyway. Either we also check that ld->ld_errno is not
> LDAP_SERVER_DOWN (provided no one resets it in the meanwhile), or clear
> lc->lconn_status as soon as we find out the connection is broken. I'd
> go for the second, but probably someone else is more familiar than me
> with the library's internals.
This is the fix I propose:
==========================================
diff -u -r1.154 result.c
--- libraries/libldap/result.c 14 Jun 2007 20:35:41 -0000 1.154
+++ libraries/libldap/result.c 7 Sep 2007 20:57:52 -0000
@@ -558,6 +558,14 @@
if ( sock_errno() == EAGAIN ) return
LDAP_MSG_X_KEEP_LOOKING;
#endif
ld->ld_errno = LDAP_SERVER_DOWN;
+#ifdef LDAP_R_COMPILE
+ ldap_pvt_thread_mutex_lock( &ld->ld_req_mutex );
+#endif
+ ldap_free_connection( ld, lc, 1, 0 );
+#ifdef LDAP_R_COMPILE
+ ldap_pvt_thread_mutex_unlock( &ld->ld_req_mutex );
+#endif
+ lc = *lcp = NULL;
return -1;
default:
==========================================
I've prepared a trivial client that ldap_sasl_bind_s(), holds on while I
shut down the server, ldap_search_ext_s() with LDAP_SERVER_DOWN and
ldap_unbind_ext(). Prior to patching, I always got SIGPIPE.
p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
---------------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Email: pierangelo.masarati(a)sys-net.it
---------------------------------------
ando(a)sys-net.it wrote:
> I've seen something similar in recent code. I was just tracking it
> down, so you basically saved me the effort of opening a ticket :). What
> I found so far is that when ldap_unbind(3) is called (which is required
> to release resources after the connection broke), the client library
> tries to send a LDAPUnbind request to the server, even though it just
> got a LDAP_SERVER_DOWN (-1). The behavior seems to be more frequent
> when the connection brakes while using ldapi://, and I couldn't spot the
> difference up to now, I'm just mentioning it in case it rings any bells.
What happens is that when try_read1msg() finds out the connection is
broken, it sets ld_errno to LDAP_SERVER_DOWN, but leaves
lc->lconn_status to LDAP_CONNST_CONNECTED, so a subsequent call to
ldap_unbind_ext causes ldap_free_connection() to try sending the
LDAPUnbind anyway. Either we also check that ld->ld_errno is not
LDAP_SERVER_DOWN (provided no one resets it in the meanwhile), or clear
lc->lconn_status as soon as we find out the connection is broken. I'd
go for the second, but probably someone else is more familiar than me
with the library's internals.
p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
---------------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Email: pierangelo.masarati(a)sys-net.it
---------------------------------------
I've seen something similar in recent code. I was just tracking it
down, so you basically saved me the effort of opening a ticket :). What
I found so far is that when ldap_unbind(3) is called (which is required
to release resources after the connection broke), the client library
tries to send a LDAPUnbind request to the server, even though it just
got a LDAP_SERVER_DOWN (-1). The behavior seems to be more frequent
when the connection brakes while using ldapi://, and I couldn't spot the
difference up to now, I'm just mentioning it in case it rings any bells.
I think not trapping SIGPIPE is correct, since this should definitely be
delegated to the application. But the library itself shouldn't trigger
false SIGPIPEs by trying to use a connection it knows it's broken.
I'll keep digging. p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
---------------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Email: pierangelo.masarati(a)sys-net.it
---------------------------------------
Full_Name: Arlene Berry
Version: 2.3.38
OS: Solaris
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (64.221.115.100)
I am using OpenLDAP 2.3.38 with Cyrus SASL and MIT Kerberos to communicate with
Active Directory on Windows Server 2003 R2. I have been primarily working on
Solaris 8 and 9 but we've also seen signs of this problem on Solaris 10, AIX,
and HPUX. Initially I was using OpenLDAP 2.3.19 which also had the problem.
The sequence of events is that I bind to Active Directory, do a search, and
successfully retrieve the results. Then I let my program and the LDAP
connection sit idle for at least 15 minutes so that the LDAP connection will
time out. Next I have the program do another ldap_search_ext which returns 0
and then ldap_result which returns -1. At this point the program does an
ldap_unbind. What the program is supposed to do next is create a new LDAP
connection and try the search again. Sometimes it does and sometimes I see an
unhandled SIGPIPE error which causes the program to halt. When run under a
debugger with the LDAP debug level set to 65535 this is what it shows:
ldap_unbind
ldap_free_request (origid 8, msgid 8)
ldap_free_connection 1 1
ldap_send_unbind
t@1 (l@1) signal PIPE (Broken Pipe) in _libc_write at 0xff21e15c
0xff21e15c: _libc_write+0x000c: bgeu _libc_write+0x40 ! 0xff21e190
Current function is sb_stream_write
521 return write( sbiod->sbiod_sb->sb_fd, buf, len );
If I tell the debugger to continue or if I set SIGPIPE to SIG_IGN at the
beginning of the program, it appears to continue successfully.
Anne Moore wrote:
> No, it's not that easy. Not just untar, configure, etc. Remember, RHEL 4.0
> doesn't support the latest OpenLdap version. So, "tar, configure, etc" would
> not do any good. ;-) Wish it did!!
I routinely **develop** OpenLDAP on RHEL 4.0; what does in your opinion
prevent it from **supporting** the latest OpenLDAP? All you need is a
compiler and few tools available with any Linux distribution.
p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
---------------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Email: pierangelo.masarati(a)sys-net.it
---------------------------------------
--On Friday, September 07, 2007 12:54 PM +0000 ando(a)sys-net.it wrote:
> diabeticithink(a)yahoo.com wrote:
>> Thank you for the help. I cannot upgrade to a later version of OpenLdap
>> because RHEL 4.0 doesn't support it. Bummer, eh?
>
> Well, there is nothing the OpenLDAP project can do for you, then, other
> than providing more than 3 years of development since 2.2.13 (it was
> released June 2004), that RedHat in the first place, but basically you
> seem to be willing to ignore (it's just one click away: untar,
> configure, make, make install).
>
> Or, you should ask RedHat to either upgrade OpenLDAP in RHEL 4.0, or fix
> your specific bug, if any (I still suspect a misconfiguration, possibly
> unrelated to OpenLDAP software itself, given that OpenLDAP software does
> not generate that specific message). As distributors, they are
> responsible for providing you up-to-date software when available (how
> responsive they are might depend on the type of maintenance contract you
> have with them).
Just to note, this has been discussed many times on the software list.
RedHat's packages should only be viewed as providing access to the LDAP
client API, and *not* as packages to be used for running a production LDAP
service. They are not maintained to the level necessary for production
usage.
I'd strongly advise using the resources already made available elsewhere
for running OpenLDAP as a production service. For example, Symas Corp.
(employer of Howard Chu, OpenLDAP's Chief Architect) provides free
pre-compiled and tested packages of OpenLDAP at <http://www.symas.com>, and
Buchan Milne, who is heavily involved with the OpenLDAP project also
provides pre-compiled packages of OpenLDAP at
<http://staff.telkomsa.net/packages/>. You are best off utilizing either
of these resources if you want to run OpenLDAP as a production service on
your systems.
--Quanah
--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration