Hi there,
I have an active-active LDAP cluster with N-way multi-master configuration. The slapd version is: 2.4.23-7.2 (Debian Squeeze)
After a modify operation in the cn=config tree, we get and unresponsible server (all nodes in the cluster)
This is the log (256 loglevel) for the node who gets the original MOD operation. Note the time cut between 12:23:55 and 12:42:54 The deferring operation message was repeated before that cut, and today It's the first time I see it. [...] Apr 26 12:23:55 ldap2 slapd[19556]: conn=7352 op=0 RESULT tag=97 err=0 text= Apr 26 12:23:55 ldap2 slapd[19556]: conn=7352 op=2 UNBIND Apr 26 12:23:55 ldap2 slapd[19556]: conn=7352 fd=204 closed Apr 26 12:23:55 ldap2 slapd[19556]: conn=1399 op=27 MOD dn="olcDatabase={1}hdb,cn=config" Apr 26 12:23:55 ldap2 slapd[19556]: conn=1399 op=27 MOD attr=olcAccess *** Apr 26 12:42:54 ldap2 slapd[19556]: do_syncrepl: rid=001 rc -5 retrying (4 retries left) Apr 26 12:42:54 ldap2 slapd[19556]: do_syncrepl: rid=006 rc -5 retrying (4 retries left) Apr 26 12:42:54 ldap2 slapd[19556]: do_syncrepl: rid=003 rc -5 retrying (4 retries left) Apr 26 12:42:54 ldap2 slapd[19556]: conn=1399 op=27 RESULT tag=103 err=0 text= Apr 26 12:42:54 ldap2 slapd[19556]: conn=7353 fd=135 ACCEPT from IP=[xxx:xx:xx:xx::xx]:58394 (IP=[::]:636) Apr 26 12:42:54 ldap2 slapd[19556]: conn=7354 fd=136 ACCEPT from IP=xx.xx.x.xx:58617 (IP=0.0.0.0:636) Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=5551 deferring operation: too many executing Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876 deferring operation: too many executing Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876 deferring operation: too many executing Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876 deferring operation: pending operations Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876 deferring operation: too many executing Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876 deferring operation: pending operations Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876 deferring operation: too many executing Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876 deferring operation: pending operations Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876 deferring operation: too many executing Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876 deferring operation: pending operations Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876 deferring operation: too many executing Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876 deferring operation: pending operations Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876 deferring operation: too many executing Apr 26 12:42:54 ldap2 slapd[19556]: conn=1399 fd=48 closed (connection lost) Apr 26 12:42:54 ldap2 slapd[19556]: conn=1451 fd=50 closed (connection lost) [...] Apr 26 12:42:54 ldap2 slapd[19556]: conn=7224 op=2 UNBIND Apr 26 12:42:54 ldap2 slapd[19556]: conn=7224 fd=134 closed Apr 26 12:42:54 ldap2 slapd[19556]: conn=6456 op=6 RESULT tag=97 err=0 text= Apr 26 12:42:54 ldap2 slapd[19556]: conn=5551 op=8 ABANDON msg=8 Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=2035 deferring operation: too many executing Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=2035 deferring operation: pending operations Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=2035 deferring operation: too many executing Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=2035 deferring operation: pending operations Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=2035 deferring operation: too many executing Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=2035 deferring operation: pending operations Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=2035 deferring operation: too many executing Apr 26 12:42:54 ldap2 slapd[19556]: conn=5551 op=10 ABANDON msg=10 Apr 26 12:42:54 ldap2 slapd[19556]: conn=3876 op=13 ABANDON msg=13 Apr 26 12:42:54 ldap2 slapd[19556]: conn=1242 op=24 ABANDON msg=24 Apr 26 12:42:54 ldap2 slapd[19556]: conn=1242 op=26 ABANDON msg=26 Apr 26 12:42:54 ldap2 slapd[19556]: conn=1242 op=28 ABANDON msg=28 Apr 26 12:42:54 ldap2 slapd[19556]: conn=1242 op=30 ABANDON msg=30 Apr 26 12:42:54 ldap2 slapd[19556]: conn=1242 op=31 UNBIND Apr 26 12:42:54 ldap2 slapd[19556]: conn=1242 fd=38 closed Apr 26 12:42:54 ldap2 slapd[19556]: conn=5967 op=7 ABANDON msg=7 Apr 26 12:42:54 ldap2 slapd[19556]: conn=5967 op=8 UNBIND Apr 26 12:42:54 ldap2 slapd[19556]: conn=5967 fd=118 closed Apr 26 12:42:54 ldap2 slapd[19556]: conn=2405 op=12 ABANDON msg=12 Apr 26 12:42:54 ldap2 slapd[19556]: conn=2405 op=13 UNBIND Apr 26 12:42:54 ldap2 slapd[19556]: conn=2405 fd=44 closed Apr 26 12:42:54 ldap2 slapd[19556]: conn=2181 op=24 ABANDON msg=24 Apr 26 12:42:54 ldap2 slapd[19556]: conn=2181 op=26 ABANDON msg=26 Apr 26 12:42:54 ldap2 slapd[19556]: conn=2181 op=28 ABANDON msg=28 Apr 26 12:42:54 ldap2 slapd[19556]: conn=2181 op=30 ABANDON msg=30 Apr 26 12:42:54 ldap2 slapd[19556]: conn=5551 op=6 ABANDON msg=6 Apr 26 12:42:54 ldap2 slapd[19556]: conn=2181 op=31 UNBIND Apr 26 12:42:54 ldap2 slapd[19556]: conn=5924 op=5 ABANDON msg=5 Apr 26 12:42:54 ldap2 slapd[19556]: conn=5924 op=7 ABANDON msg=7 Apr 26 12:42:54 ldap2 slapd[19556]: conn=5924 op=9 ABANDON msg=9 Apr 26 12:42:54 ldap2 slapd[19556]: conn=5924 op=10 UNBIND Apr 26 12:42:54 ldap2 slapd[19556]: conn=5924 fd=115 closed Apr 26 12:42:54 ldap2 slapd[19556]: conn=1389 op=9 ABANDON msg=9 Apr 26 12:42:54 ldap2 slapd[19556]: conn=1167 op=15 UNBIND Apr 26 12:42:54 ldap2 slapd[19556]: conn=1167 fd=37 closed Apr 26 12:42:54 ldap2 slapd[19556]: conn=1903 op=10 ABANDON msg=10 Apr 26 12:42:54 ldap2 slapd[19556]: conn=1903 op=11 UNBIND Apr 26 12:42:54 ldap2 slapd[19556]: conn=1903 fd=58 closed Apr 26 12:42:54 ldap2 slapd[19556]: conn=4400 op=3 ABANDON msg=3 Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=6631 deferring operation: too many executing Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=6631 deferring operation: pending operations Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=6631 deferring operation: too many executing Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=6631 deferring operation: pending operations Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=6631 deferring operation: too many executing Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=6631 deferring operation: pending operations Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=6631 deferring operation: too many executing Apr 26 12:42:54 ldap2 slapd[19556]: conn=4400 op=5 UNBIND [...]
In other node, also a cut in the log
[...] Apr 26 12:22:40 ldap3 slapd[7650]: conn=8292 op=1 SRCH base="dc=xxx,dc=es" scope=2 deref=0 filter="(objectClass=*)" Apr 26 12:22:40 ldap3 slapd[7650]: conn=8292 op=1 SRCH attr=* + Apr 26 12:22:40 ldap3 slapd[7650]: conn=8292 op=1 SEARCH RESULT tag=101 err=0 nentries=0 text= Apr 26 12:22:40 ldap3 slapd[7650]: conn=8292 op=2 UNBIND Apr 26 12:22:40 ldap3 slapd[7650]: conn=8292 fd=158 closed Apr 26 12:22:41 ldap3 slapd[7650]: conn=8293 fd=158 ACCEPT from IP=192.168.21.21:44703 (IP=0.0.0.0:636) Apr 26 12:22:41 ldap3 slapd[7650]: conn=8294 fd=163 ACCEPT from IP=192.168.21.22:58503 (IP=0.0.0.0:636) Apr 26 12:22:41 ldap3 slapd[7650]: conn=8295 fd=164 ACCEPT from IP=192.168.21.22:58506 (IP=0.0.0.0:636) Apr 26 12:22:41 ldap3 slapd[7650]: conn=8293 fd=158 TLS established tls_ssf=128 ssf=128 Apr 26 12:22:41 ldap3 slapd[7650]: conn=8294 fd=163 TLS established tls_ssf=128 ssf=128 Apr 26 12:22:41 ldap3 slapd[7650]: conn=8295 fd=164 TLS established tls_ssf=128 ssf=128 Apr 26 12:22:42 ldap3 slapd[7650]: slap_client_connect: URI=ldaps://ldap1.xxx.xx/ DN="cn=admin,cn=config" ldap_sasl_bind_s failed (-5) *** Apr 26 12:42:54 ldap3 slapd[7650]: do_syncrepl: rid=001 rc -5 retrying (4 retries left) Apr 26 12:42:54 ldap3 slapd[7650]: null_callback : error code 0x35 Apr 26 12:42:54 ldap3 slapd[7650]: null_callback : error code 0x35 Apr 26 12:42:54 ldap3 slapd[7650]: null_callback : error code 0x35 Apr 26 12:42:54 ldap3 slapd[7650]: conn=8297 fd=165 ACCEPT from IP=[xxx:xxx:xxx:xx::5]:42973 (IP=[::]:636) Apr 26 12:42:54 ldap3 slapd[7650]: conn=8296 fd=160 ACCEPT from IP=192.168.21.22:58509 (IP=0.0.0.0:636) Apr 26 12:42:54 ldap3 slapd[7650]: conn=8293 op=1 UNBIND Apr 26 12:42:54 ldap3 slapd[7650]: conn=8293 op=0 BIND dn="cn=admin,cn=config" method=128 [...]
In other node, also a cut after the replication operation, the my service restart:
[...] Apr 26 12:22:37 ldap1 slapd[5945]: conn=8377 fd=159 closed (connection lost) Apr 26 12:22:37 ldap1 slapd[5945]: conn=8373 fd=156 closed (connection lost) Apr 26 12:22:37 ldap1 slapd[5945]: conn=8374 fd=157 closed (connection lost) Apr 26 12:22:37 ldap1 slapd[5945]: conn=8375 fd=158 closed (connection lost) Apr 26 12:22:41 ldap1 slapd[5945]: conn=8378 fd=158 ACCEPT from IP=192.168.21.23:57377 (IP=0.0.0.0:636) Apr 26 12:22:41 ldap1 slapd[5945]: conn=8379 fd=159 ACCEPT from IP=192.168.21.23:57379 (IP=0.0.0.0:636) Apr 26 12:22:41 ldap1 slapd[5945]: conn=8380 fd=160 ACCEPT from IP=192.168.21.22:33628 (IP=0.0.0.0:636) Apr 26 12:22:41 ldap1 slapd[5945]: conn=8381 fd=161 ACCEPT from IP=192.168.21.22:33629 (IP=0.0.0.0:636) Apr 26 12:22:41 ldap1 slapd[5945]: conn=8379 fd=159 TLS established tls_ssf=128 ssf=128 Apr 26 12:22:41 ldap1 slapd[5945]: conn=8380 fd=160 TLS established tls_ssf=128 ssf=128 Apr 26 12:22:41 ldap1 slapd[5945]: conn=8379 op=0 BIND dn="cn=admin,dc=xxx,dc=es" method=128 Apr 26 12:22:41 ldap1 slapd[5945]: conn=8379 op=0 BIND dn="cn=admin,dc=xxx,dc=es" mech=SIMPLE ssf=0 Apr 26 12:22:41 ldap1 slapd[5945]: conn=8379 op=0 RESULT tag=97 err=0 text= Apr 26 12:22:41 ldap1 slapd[5945]: conn=8379 op=1 SRCH base="dc=xxx,dc=es" scope=2 deref=0 filter="(objectClass=*)" Apr 26 12:22:41 ldap1 slapd[5945]: conn=8379 op=1 SRCH attr=* + Apr 26 12:22:41 ldap1 slapd[5945]: conn=8381 fd=161 TLS established tls_ssf=128 ssf=128 Apr 26 12:22:41 ldap1 slapd[5945]: conn=8381 op=0 BIND dn="cn=admin,dc=xxx,dc=es" method=128 Apr 26 12:22:41 ldap1 slapd[5945]: conn=8381 op=0 BIND dn="cn=admin,dc=xxx,dc=es" mech=SIMPLE ssf=0 Apr 26 12:22:41 ldap1 slapd[5945]: conn=8381 op=0 RESULT tag=97 err=0 text= Apr 26 12:22:41 ldap1 slapd[5945]: conn=8381 op=1 SRCH base="dc=xxx,dc=es" scope=2 deref=0 filter="(objectClass=*)" Apr 26 12:22:41 ldap1 slapd[5945]: conn=8381 op=1 SRCH attr=* + Apr 26 12:22:41 ldap1 slapd[5945]: conn=8378 fd=158 TLS established tls_ssf=128 ssf=128 Apr 26 12:22:42 ldap1 slapd[5945]: conn=8379 op=1 SEARCH RESULT tag=101 err=0 nentries=1 text= Apr 26 12:22:42 ldap1 slapd[5945]: conn=8381 op=1 SEARCH RESULT tag=101 err=0 nentries=1 text= Apr 26 12:22:42 ldap1 slapd[5945]: slap_client_connect: URI=ldaps://ldap3.xxx.xx/ DN="cn=admin,cn=config" ldap_sasl_bind_s failed (-5) *** Apr 26 12:42:54 ldap1 slapd[5945]: daemon: shutdown requested and initiated. Apr 26 12:42:54 ldap1 slapd[5945]: conn=1009 fd=13 closed (slapd shutdown) Apr 26 12:42:54 ldap1 slapd[5945]: conn=1013 fd=20 closed (slapd shutdown) Apr 26 12:42:54 ldap1 slapd[5945]: conn=1014 fd=21 closed (slapd shutdown) [...]
Any idea?
Best regards.