--On Thursday, September 09, 2010 10:43:06 PM -0700 Howard Chu hyc@symas.com wrote:
whm@stanford.edu wrote:
--On Friday, September 03, 2010 01:23:17 AM -0700 Bill MacAllisterwhm@stanford.edu wrote:
The problem with the database was only coincidental. Restoring the database got the failing replica past the problem replication event.
In the replica pool of 6 servers we have seen the problem on there of the servers. In thinking about this more it is unlikely that it is a slave problem since the slaves have been in use for about 6 weeks and we did not see the problem. Only when we changed the master to 2.4.23 did we see the problem. I have captured a master debug log of the problem event. It is at http://www.stanford.edu/~whm/files/master-debug.txt.
Bill
Please try with this patch:
Index: sasl.c
RCS file: /repo/OpenLDAP/pkg/ldap/libraries/libldap/sasl.c,v retrieving revision 1.79 diff -u -r1.79 sasl.c --- sasl.c 13 Apr 2010 20:17:56 -0000 1.79 +++ sasl.c 10 Sep 2010 05:42:22 -0000 @@ -733,8 +733,9 @@ return ret; } else if ( p->buf_out.buf_ptr != p->buf_out.buf_end ) { /* partial write? pretend nothing got written */
p->flags |= LDAP_PVT_SASL_PARTIAL_WRITE;len2 = 0;
sock_errset(EAGAIN);
len2 = -1;
}
/* return number of bytes encoded, not written, to ensure
Howard,
The patched packages where installed last night on the production OpenLDAP master with two of the replicas in the failing state. Once the patched slapd was started the two problem replicas quickly caught up and everything looks good now.
Thanks again for your help.
Bill