--On Wednesday, February 15, 2017 12:15 PM -0800 "Paul B. Henson" henson@acm.org wrote:
On Wed, Feb 15, 2017 at 09:04:48AM -0800, Quanah Gibson-Mount wrote:
Hm, so there is http://www.openldap.org/its/index.cgi/?findid=8413, but not sure that's the same issue.
I do have a 4-way MMR setup, but that seems to be the only simularity. It's crashing in mods.c, not search.c, and from a segfault when actually dereferencing a null pointer, rather than from an assertion failure checking for one. Well, I suppose the other simularity is that it happens during replication.
Fortunately it's not that frequent of an occurance, and our load balancer pops the load over to the failover master fairly quickly when it happens, we usually only lose one transaction that I need to go clean up. It typically happens during the heavy update load of our daily identity management syncronization batch job.
I would suggest filing an ITS with the full backtrace info, so I can track it. :) It could be useful to have the entry data from the accesslog as well for the failed replication op, as we can see the failed entry DN in the output of your backtrace. For example:
reqStart=20161225113103.000003Z,cn=accesslog
In some ways, although it's hard to be certain, it almost looks like it is complaining that the end operation couldn't be completed (I.e., the account being modified didn't have the attribute/value pair in the entry that the delete was being requested for).
Does the operation complete successfully after slapd is restarted?
--Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
On Wed, Feb 15, 2017 at 12:22:29PM -0800, Quanah Gibson-Mount wrote:
I would suggest filing an ITS with the full backtrace info, so I can track it.
Ok, will do.
It could be useful to have the entry data from the accesslog as well for the failed replication op, as we can see the failed entry DN in the output of your backtrace.
That would be in the accesslog on the server that crashed? Hmm, the server that crashed is the master, and all updates were going to it. Am I confused, or did the update that caused the crash come in via syncrepl though, and hence originate from a different server? So the accesslog entry you want would be from that server, not the server that crashed? But given no other servers should have been receiving updates, how would an update have been received via replication? Or is this another issue like the memberOf problem where updates are being improperly replicated?
Does the operation complete successfully after slapd is restarted?
As far as I can tell; at least the server doesn't crash. I don't get any errors at the application level in my logs with that specific message from the backtrace.
Hmm, looking at the logs that correspond with one of the crashes:
Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=805 MOD dn="uid=vntruong,ou=user,dc=csupomona,dc=edu" Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=805 MOD attr=eduPersonAffiliation eduPersonPrimaryAffiliation csupomonaEduPersonAffiliation Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=805 RESULT tag=103 err=0 text=
This operation appears to succeed? Then there's this:
Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=806 MOD dn="uid=vntruong,ou=user,dc=csupomona,dc=edu" Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=806 MOD attr=csupomonaEduPersonExpiration
Then nothing, the server crashed. In my application, it's the csupomonaEduPersonExpiration modification that fails.
Here's the entry from the accesslog for the eduPersonAffiliation:
dn: reqStart=20170214120013.000001Z,cn=accesslog objectClass: auditModify structuralObjectClass: auditModify reqStart: 20170214120013.000001Z reqEnd: 20170214120013.000002Z reqType: modify reqSession: 37859 reqAuthzID: cn=idmgmt,ou=user,ou=service,dc=csupomona,dc=edu reqDN: uid=vntruong,ou=user,dc=csupomona,dc=edu reqResult: 0 reqMod: eduPersonAffiliation:= reqMod: eduPersonPrimaryAffiliation:= reqMod: csupomonaEduPersonAffiliation:= reqMod: entryCSN:= 20170214120013.628665Z#000000#000#000000 reqMod: modifiersName:= cn=idmgmt,ou=user,ou=service,dc=csupomona,dc=edu reqMod: modifyTimestamp:= 20170214120013Z reqEntryUUID: bd5ba51c-7a1b-4bdb-8bf3-fe90552f5909 entryCSN: 20170214120013.628665Z#000000#000#000000 entryUUID: 4737c48c-e46e-45a4-ba4b-2eb61540b27b creatorsName: cn=accesslog createTimestamp: 20170214120013Z modifiersName: cn=accesslog modifyTimestamp: 20170214120013Z
Then the next entry isn't until:
dn: reqStart=20170214184643.000008Z,cn=accesslog
when I restarted the server. I guess I am confused; the entryCSN has serverID 0, the ID of this server, so this isn't a replicated op, it's an op from this server. So why does the backtrace show the change coming in via syncrepl? It seems like it's getting applied twice. The change is deleting the attribute, so the second time it's getting applied you would get a no such attribute error...
openldap-technical@openldap.org