On Fri, May 02, 2014 at 07:22:10PM -0700, Quanah Gibson-Mount wrote:
I would suggest (if you haven't) enabling sync replication logging (loglevel sync) in addition to whatever other loglevels you have.
Ok, so I reloaded my dev environment with a fresh snapshot of production data, and started it back up. After a couple hours, all of the servers started producing this entry in the accesslog db:
dn: reqStart=20140506065034.000024Z,cn=accesslog objectClass: auditModify structuralObjectClass: auditModify reqStart: 20140506065034.000024Z reqEnd: 20140506065034.000028Z reqType: modify reqSession: 3 reqAuthzID: cn=ldaproot,dc=csupomona,dc=edu reqDN: cn=slapd-monitor,ou=user,ou=service,dc=csupomona,dc=edu reqResult: 0 reqMod: entryCSN:= 20140506065034.590711Z#000000#003#000000 reqMod: modifiersName:= cn=ldaproot,dc=csupomona,dc=edu reqMod: modifyTimestamp:= 20140506065034Z reqEntryUUID: c16729b0-ce1e-4f9d-9123-e0b8f115133c entryUUID: f1ff441b-ed2a-4623-a7b6-55931f2f28de creatorsName: cn=accesslog createTimestamp: 20140506065034Z entryCSN: 20140506065034.590711Z#000000#003#000000 modifiersName: cn=accesslog modifyTimestamp: 20140506065034Z
The reqStart, reqEnd and entryUUID attributes changed, but the reqMod, entryCSN, and modifyTimestamp are all the same on all entries on all servers.
This continued for hours on end, until the servers ran out of memory and the slapd process was killed:
dn: reqStart=20140506102707.000012Z,cn=accesslog objectClass: auditModify structuralObjectClass: auditModify reqStart: 20140506102707.000012Z reqEnd: 20140506102709.000000Z reqType: modify reqSession: 2 reqAuthzID: cn=ldaproot,dc=csupomona,dc=edu reqDN: cn=slapd-monitor,ou=user,ou=service,dc=csupomona,dc=edu reqResult: 0 reqMod: entryCSN:= 20140506065034.590711Z#000000#003#000000 reqMod: modifiersName:= cn=ldaproot,dc=csupomona,dc=edu reqMod: modifyTimestamp:= 20140506065034Z reqEntryUUID: c16729b0-ce1e-4f9d-9123-e0b8f115133c entryUUID: 9c332ed5-f39c-40d1-a8c1-1feb936c1d81 creatorsName: cn=accesslog createTimestamp: 20140506065034Z entryCSN: 20140506065034.590711Z#000000#003#000000 modifiersName: cn=accesslog modifyTimestamp: 20140506065034Z
On the server with id 3, the source of this CSN, logs show:
May 5 23:50:34 pip-dev slapd[31584]: conn=8558 op=0 BIND dn="cn=slapd-monitor,ou=user,ou=service,dc=csupomona,dc=edu" method=128 May 5 23:50:34 pip-dev slapd[31584]: slap_queue_csn: queing 0x7feb09ffbbd0 20140506065034.590711Z#000000#003#000000 May 5 23:50:34 pip-dev slapd[31584]: slap_queue_csn: queing 0x7feafc002af0 20140506065034.590711Z#000000#003#000000 May 5 23:50:34 pip-dev slapd[31584]: slap_graduate_commit_csn: removing 0x7feafc30dd20 20140506065034.590711Z#000000#003#000000 May 5 23:50:34 pip-dev slapd[31584]: slap_graduate_commit_csn: removing 0x7feafc321610 20140506065034.590711Z#000000#003#000000 May 5 23:50:34 pip-dev slapd[31584]: conn=8558 op=0 RESULT tag=97 err=49 text= May 5 23:50:34 pip-dev slapd[31584]: syncprov_sendresp: to=002, cookie=rid=003,sid=003,csn=20140506065034.590711Z#000000#003#000000 May 5 23:50:34 pip-dev slapd[31584]: syncprov_sendresp: to=001, cookie=rid=003,sid=003,csn=20140506065034.590711Z#000000#003#000000 May 5 23:50:34 pip-dev slapd[31584]: syncprov_sendresp: cookie=rid=000,sid=003,csn=20140506065034.590711Z#000000#003#000000 May 5 23:50:34 pip-dev slapd[31584]: conn=1006 op=245464 ABANDON msg=245464 May 5 23:50:34 pip-dev slapd[31584]: conn=1006 op=245465 SRCH base="dc=csupomona,dc=edu" scope=2 deref=0 filter="(objectClass=*)" May 5 23:50:34 pip-dev slapd[31584]: conn=1006 op=245465 SRCH attr=* + May 5 23:50:34 pip-dev slapd[31584]: srs csn 20140506065033.649766Z#000000#000#000000 May 5 23:50:34 pip-dev slapd[31584]: log csn 20140506044033.088487Z#000000#000#000000 May 5 23:50:34 pip-dev slapd[31584]: cmp -512, too old May 5 23:50:34 pip-dev slapd[31584]: log csn 20140506044532.528640Z#000000#003#000000 May 5 23:50:34 pip-dev slapd[31584]: cmp -512, too old May 5 23:50:34 pip-dev slapd[31584]: log csn 20140506044532.602250Z#000000#001#000000 [...] May 5 23:50:34 pip-dev slapd[31584]: log csn 20140506065034.351114Z#000000#001#000000 May 5 23:50:34 pip-dev slapd[31584]: cmp 0, too old May 5 23:50:34 pip-dev slapd[31584]: log csn 20140506065034.590711Z#000000#003#000000 May 5 23:50:34 pip-dev slapd[31584]: syncprov_search_response: cookie=rid=003,sid=003,csn=20140506065033.649766Z#000000#000#000000;20140506065034.351114Z#000000#001#000000;20140506065033.164941Z#000000#002#000000;20140506065034.590711Z#000000#003#000000 May 5 23:50:34 pip-dev slapd[31584]: conn=1006 op=245465 SEARCH RESULT tag=101 err=0 nentries=1 text= May 5 23:50:34 pip-dev slapd[31584]: conn=8558 fd=19 closed (connection lost) May 5 23:50:34 pip-dev slapd[31584]: do_syncrep2: rid=002 cookie=rid=000,sid=001,csn=20140506065034.590711Z#000000#003#000000 May 5 23:50:34 pip-dev slapd[31584]: do_syncrep2: rid=002 CSN too old, ignoring 20140506065034.590711Z#000000#003#000000 (reqStart=20140506065034.000024Z,cn=accesslog) [...] May 5 23:50:34 pip-dev slapd[31584]: slap_queue_csn: queing 0x7feb2011477b 20140506065034.590711Z#000000#003#000000 May 5 23:50:34 pip-dev slapd[31584]: slap_queue_csn: queing 0x7feb2010d880 20140506065034.590711Z#000000#003#000000 May 5 23:50:34 pip-dev slapd[31584]: do_syncrep2: rid=003 cookie=rid=003,sid=002,csn=20140506065034.675448Z#000000#000#000000 May 5 23:50:34 pip-dev slapd[31584]: do_syncrep2: rid=003 CSN too old, ignoring 20140506065034.675448Z#000000#000#000000 (reqStart=20140506065034.000026Z,cn=accesslog) May 5 23:50:34 pip-dev slapd[31584]: slap_graduate_commit_csn: removing 0x7feb201203b0 20140506065034.590711Z#000000#003#000000 May 5 23:50:34 pip-dev slapd[31584]: slap_graduate_commit_csn: removing 0x7feb20126830 20140506065034.590711Z#000000#003#000000
On another server, the first instance of this CSN shows up:
May 5 23:50:34 fosse-dev slapd[7791]: do_syncrep2: rid=003 cookie=rid=000,sid=003,csn=20140506065034.590711Z#000000#003#000000 May 5 23:50:34 fosse-dev slapd[7791]: slap_queue_csn: queing 0x7fa744104f7d 20140506065034.590711Z#000000#003#000000 May 5 23:50:34 fosse-dev slapd[7791]: slap_queue_csn: queing 0x7fa74413d590 20140506065034.590711Z#000000#003#000000 May 5 23:50:34 fosse-dev slapd[7791]: syncprov_matchops: skipping original sid 003 May 5 23:50:34 fosse-dev slapd[7791]: slap_graduate_commit_csn: removing 0x7fa7441369d0 20140506065034.590711Z#000000#003#000000 May 5 23:50:34 fosse-dev slapd[7791]: syncprov_sendresp: to=001, cookie=rid=001,csn=20140506065034.590711Z#000000#003#000000 May 5 23:50:34 fosse-dev slapd[7791]: slap_graduate_commit_csn: removing 0x7fa744136770 20140506065034.590711Z#000000#003#000000 May 5 23:50:34 fosse-dev slapd[7791]: syncrepl_message_to_op: rid=003 be_modify cn=slapd-monitor,ou=user,ou=service,dc=csupomona,dc=edu (0) May 5 23:50:34 fosse-dev slapd[7791]: slap_queue_csn: queing 0x7fa744134100 20140506065034.590711Z#000000#003#000000 May 5 23:50:34 fosse-dev slapd[7791]: syncprov_sendresp: to=002, cookie=rid=001,csn=20140506065034.590711Z#000000#003#000000 May 5 23:50:34 fosse-dev slapd[7791]: conn=1021 op=2 ABANDON msg=2 May 5 23:50:34 fosse-dev slapd[7791]: conn=1021 op=3 SRCH base="dc=csupomona,dc=edu" scope=2 deref=0 filter="(objectClass=*)" May 5 23:50:34 fosse-dev slapd[7791]: conn=1021 op=3 SRCH attr=* + May 5 23:50:34 fosse-dev slapd[7791]: srs csn 20140506065033.649766Z#000000#000#000000 May 5 23:50:34 fosse-dev slapd[7791]: log csn 20140506044032.985641Z#000000#003#000000 May 5 23:50:34 fosse-dev slapd[7791]: cmp -512, too old May 5 23:50:34 fosse-dev slapd[7791]: log csn 20140506044033.088487Z#000000#000#000000 May 5 23:50:34 fosse-dev slapd[7791]: cmp -512, too old May 5 23:50:34 fosse-dev slapd[7791]: log csn 20140506044532.528640Z#000000#003#000000 May 5 23:50:34 fosse-dev slapd[7791]: cmp -512, too old [...] May 5 23:50:34 fosse-dev slapd[7791]: slap_graduate_commit_csn: removing 0x7fa744136770 20140506065034.590711 Z#000000#003#000000 May 5 23:50:34 fosse-dev slapd[7791]: cmp -256, too old May 5 23:50:34 fosse-dev slapd[7791]: log csn 20140506055533.107369Z#000000#003#000000 May 5 23:50:34 fosse-dev slapd[7791]: cmp -256, too old [...] May 5 23:50:34 fosse-dev slapd[7791]: log csn 20140506062531.905240Z#000000#003#000000 May 5 23:50:34 fosse-dev slapd[7791]: cmp -3, too old May 5 23:50:34 fosse-dev slapd[7791]: do_syncrep2: rid=002 cookie=rid=002,sid=002,csn=20140506065034.590711Z#000000#003#000000 May 5 23:50:34 fosse-dev slapd[7791]: log csn 20140506062532.031640Z#000000#001#000000 May 5 23:50:34 fosse-dev slapd[7791]: cmp -3, too old [...] May 5 23:50:34 fosse-dev slapd[7791]: log csn 20140506065034.590711Z#000000#003#000000 May 5 23:50:34 fosse-dev slapd[7791]: syncprov_search_response: cookie=rid=001,csn=20140506065033.649766Z#000000#000#000000;20140506065034.351114Z#000000#001#000000;20140506065033.164941Z#000000#002#000000;20140506065034.590711Z#000000#003#000000 May 5 23:50:34 fosse-dev slapd[7791]: conn=1021 op=3 SEARCH RESULT tag=101 err=0 nentries=1 text= May 5 23:50:34 fosse-dev slapd[7791]: do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20140506065034.590711Z#000000#003#000000 May 5 23:50:34 fosse-dev slapd[7791]: do_syncrep2: rid=001 CSN too old, ignoring 20140506065034.590711Z#000000#003#000000 (reqStart=20140506065034.000024Z,cn=accesslog)
And then they all just go on forever, queing the CSN and removing it.
Any thoughts on what's blowing up?
Thanks...