kacarstensen@csupomona.edu wrote:
I've investigated this issue a little bit more since my initial bug report.
I'm not sure if connection_write is supposed to validate that a stream is active before or after calling slapd_clr_write, but it seems like the assertion wouldn't be an issue if that validation were performed before calling slapd_clr_write. To test this thought, I rebuilt openldap 2.4.28 with the following patch:
--- openldap-2.4.28/servers/slapd/connection.c 2011-11-25 10:52:29.000000000 -0800 +++ openldap-2.4.28-new/servers/slapd/connection.c 2012-01-12 13:35:45.000000000 -0800 @@ -1893,8 +1893,6 @@
assert( connections != NULL );
- slapd_clr_write( s, 0 );
- c = connection_get( s ); if( c == NULL ) { Debug( LDAP_DEBUG_ANY,
@@ -1903,6 +1901,8 @@ return -1; }
- slapd_clr_write( s, 0 );
#ifdef HAVE_TLS if ( c->c_is_tls&& c->c_needs_tls_accept ) { connection_return( c );
and tried to reproduce the problem under the same circumstances as reported in my initial bug report. The master slapd tolerated the misconfigured replicas for 5 days without crashing; before, it would crash reliably within a half hour or so. I didn't notice any regressions due to the patch, though the master slapd wasn't exposed to a typical workload during the experiment.
Any thoughts on this patch?
Sounds OK to me, committed to git master. Thanks.