On Wed, Feb 15, 2017 at 12:22:29PM -0800, Quanah Gibson-Mount wrote:
I would suggest filing an ITS with the full backtrace info, so I can track it.
Ok, will do.
It could be useful to have the entry data from the accesslog as well for the failed replication op, as we can see the failed entry DN in the output of your backtrace.
That would be in the accesslog on the server that crashed? Hmm, the server that crashed is the master, and all updates were going to it. Am I confused, or did the update that caused the crash come in via syncrepl though, and hence originate from a different server? So the accesslog entry you want would be from that server, not the server that crashed? But given no other servers should have been receiving updates, how would an update have been received via replication? Or is this another issue like the memberOf problem where updates are being improperly replicated?
Does the operation complete successfully after slapd is restarted?
As far as I can tell; at least the server doesn't crash. I don't get any errors at the application level in my logs with that specific message from the backtrace.
Hmm, looking at the logs that correspond with one of the crashes:
Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=805 MOD dn="uid=vntruong,ou=user,dc=csupomona,dc=edu" Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=805 MOD attr=eduPersonAffiliation eduPersonPrimaryAffiliation csupomonaEduPersonAffiliation Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=805 RESULT tag=103 err=0 text=
This operation appears to succeed? Then there's this:
Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=806 MOD dn="uid=vntruong,ou=user,dc=csupomona,dc=edu" Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=806 MOD attr=csupomonaEduPersonExpiration
Then nothing, the server crashed. In my application, it's the csupomonaEduPersonExpiration modification that fails.
Here's the entry from the accesslog for the eduPersonAffiliation:
dn: reqStart=20170214120013.000001Z,cn=accesslog objectClass: auditModify structuralObjectClass: auditModify reqStart: 20170214120013.000001Z reqEnd: 20170214120013.000002Z reqType: modify reqSession: 37859 reqAuthzID: cn=idmgmt,ou=user,ou=service,dc=csupomona,dc=edu reqDN: uid=vntruong,ou=user,dc=csupomona,dc=edu reqResult: 0 reqMod: eduPersonAffiliation:= reqMod: eduPersonPrimaryAffiliation:= reqMod: csupomonaEduPersonAffiliation:= reqMod: entryCSN:= 20170214120013.628665Z#000000#000#000000 reqMod: modifiersName:= cn=idmgmt,ou=user,ou=service,dc=csupomona,dc=edu reqMod: modifyTimestamp:= 20170214120013Z reqEntryUUID: bd5ba51c-7a1b-4bdb-8bf3-fe90552f5909 entryCSN: 20170214120013.628665Z#000000#000#000000 entryUUID: 4737c48c-e46e-45a4-ba4b-2eb61540b27b creatorsName: cn=accesslog createTimestamp: 20170214120013Z modifiersName: cn=accesslog modifyTimestamp: 20170214120013Z
Then the next entry isn't until:
dn: reqStart=20170214184643.000008Z,cn=accesslog
when I restarted the server. I guess I am confused; the entryCSN has serverID 0, the ID of this server, so this isn't a replicated op, it's an op from this server. So why does the backtrace show the change coming in via syncrepl? It seems like it's getting applied twice. The change is deleting the attribute, so the second time it's getting applied you would get a no such attribute error...