whm@stanford.edu wrote:
--On Friday, September 03, 2010 01:23:17 AM -0700 Bill MacAllisterwhm@stanford.edu wrote:
The problem with the database was only coincidental. Restoring the database got the failing replica past the problem replication event.
In the replica pool of 6 servers we have seen the problem on there of the servers. In thinking about this more it is unlikely that it is a slave problem since the slaves have been in use for about 6 weeks and we did not see the problem. Only when we changed the master to 2.4.23 did we see the problem. I have captured a master debug log of the problem event. It is at http://www.stanford.edu/~whm/files/master-debug.txt.
Bill
Please try with this patch:
Index: sasl.c =================================================================== RCS file: /repo/OpenLDAP/pkg/ldap/libraries/libldap/sasl.c,v retrieving revision 1.79 diff -u -r1.79 sasl.c --- sasl.c 13 Apr 2010 20:17:56 -0000 1.79 +++ sasl.c 10 Sep 2010 05:42:22 -0000 @@ -733,8 +733,9 @@ return ret; } else if ( p->buf_out.buf_ptr != p->buf_out.buf_end ) { /* partial write? pretend nothing got written */ - len2 = 0; p->flags |= LDAP_PVT_SASL_PARTIAL_WRITE; + sock_errset(EAGAIN); + len2 = -1; }
/* return number of bytes encoded, not written, to ensure