I've investigated this issue a little bit more since my initial bug
report.
I'm not sure if connection_write is supposed to validate that a stream
is active before or after calling slapd_clr_write, but it seems like the
assertion wouldn't be an issue if that validation were performed before
calling slapd_clr_write. To test this thought, I rebuilt openldap 2.4.28
with the following patch:
--- openldap-2.4.28/servers/slapd/connection.c 2011-11-25 10:52:29.000000000 -0800
+++ openldap-2.4.28-new/servers/slapd/connection.c 2012-01-12 13:35:45.000000000 -0800
@@ -1893,8 +1893,6 @@
assert( connections != NULL );
- slapd_clr_write( s, 0 );
- c = connection_get( s );
if( c == NULL ) {
Debug( LDAP_DEBUG_ANY,
@@ -1903,6 +1901,8 @@
return -1;
}
#ifdef HAVE_TLS
if ( c->c_is_tls&& c->c_needs_tls_accept ) {
connection_return( c );
and tried to reproduce the problem under the same circumstances as
reported in my initial bug report. The master slapd tolerated the
misconfigured replicas for 5 days without crashing; before, it would
crash reliably within a half hour or so. I didn't notice any regressions
due to the patch, though the master slapd wasn't exposed to a typical
workload during the experiment.
Any thoughts on this patch?
Sounds OK to me, committed to git master. Thanks.
--
-- Howard Chu
CTO, Symas Corp.
http://www.symas.com
Director, Highland Sun
http://highlandsun.com/hyc/
Chief Architect, OpenLDAP
http://www.openldap.org/project/