Our LDAP infrastructure is currently running 2.4.35, and consists of two read/write masters configured in mirror mode behind the load balancer, with three additional read-only slaves using syncrepl. We recently decided to add the memberof overlay to our configuration, due to an application that did not support querying the groups for members.
I updated our configuration to load the module, and add the overlay, and proceeded to rip through all of our groups removing and then re-adding the members in order to populate the memberOf attribute on the user objects.
While doing so, there were errors logged on all of the servers:
Oct 10 04:26:09 fosse slapd[9944]: conn=75373 op=184748: memberof_value_modify DN="uid=tdnguyen1,ou=user,dc=cs upomona,dc=edu" delete memberOf="uid=classes,ou=group,dc=csupomona,dc=edu" failed err=16
This was expected, as the memberOf attribute did not exist in our current directory. However, what was unexpected was that the slapd processes started to mysteriously die while I was trying to repopulate the groups. No log messages, or any other indication of the failure, just attribute delete errors:
Oct 10 04:29:39 filmore slapd[25526]: conn=-1 op=0: memberof_value_modify DN="uid=rfu,ou=user,dc=csupomona,dc=edu" delete memberOf="uid=mhr31806,ou=group,dc=csupomona,dc=edu" failed err=16 Oct 10 04:29:39 filmore slapd[25526]: conn=-1 op=0: memberof_value_modify DN="uid=rfu,ou=user,dc=csupomona,dc=edu" delete memberOf="uid=mhr_classes,ou=group,dc=csupomona,dc=edu" failed err=16
Then the process was gone. It was definitely related to mass group updates, they would run for hours with no problems under general use, but as soon as I started churning group members, bam, one or two of them would go away.
I ended up backing out the modification, dumping the database, removing all of the memberOf attributes, and reloading it. I will try to duplicate this in a test environment with debugging enabled and see if I can get a better idea what's going on, but I was just curious if anyone had seen anything like this or knew of any underlying issues with the memberof overlay.
Thanks much.
--On Friday, October 11, 2013 1:04 PM -0700 "Paul B. Henson" henson@acm.org wrote:
This was expected, as the memberOf attribute did not exist in our current directory. However, what was unexpected was that the slapd processes started to mysteriously die while I was trying to repopulate the groups. No log messages, or any other indication of the failure, just attribute delete errors:
Enable core files: http://wiki.zimbra.com/wiki/Enabling_Core_Files
I'd also note http://www.openldap.org/its/index.cgi/?findid=7710
--Quanah
--
Quanah Gibson-Mount Architect - Server Zimbra Software, LLC -------------------- Zimbra :: the leader in open source messaging and collaboration
From: Quanah Gibson-Mount [mailto:quanah@zimbra.com] Sent: Friday, October 11, 2013 1:25 PM
Enable core files: http://wiki.zimbra.com/wiki/Enabling_Core_Files
Thanks for the link, I will do so when I get the test environment up.
I'd also note http://www.openldap.org/its/index.cgi/?findid=7710
I saw the contextCSN issue float by on the list, but I didn't run into that problem, or at least my monitoring system that verifies replication consistency never complained about it. I see there is also some mention of segmentation faults, I will take a further look at that.
Thanks much.
Paul B. Henson wrote:
From: Quanah Gibson-Mount [mailto:quanah@zimbra.com] I'd also note http://www.openldap.org/its/index.cgi/?findid=7710
I saw the contextCSN issue float by on the list, but I didn't run into that problem, or at least my monitoring system that verifies replication consistency never complained about it.
If you enable slapo-memberof on all your replicas you will see it.
Ciao, Michael.
On Sat, Oct 12, 2013 at 10:45:30AM +0200, Michael Ströder wrote:
If you enable slapo-memberof on all your replicas you will see it.
I did have it enabled on everything for about a day and a half without noticing it. But it looks like the fix for that inconsistency will hopefully come along with the fix for the crashes.
Thanks...
Could you please try to reproduce this with OpenLDAP from git repo?
It contains a fix for ITS#7710:
http://www.openldap.org/its/index.cgi?findid=7710
RE snapshot link in case you don't want to use command-line git:
http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=snapshot;h=refs/he...
Ciao, Michael.
Paul B. Henson wrote:
Our LDAP infrastructure is currently running 2.4.35, and consists of two read/write masters configured in mirror mode behind the load balancer, with three additional read-only slaves using syncrepl. We recently decided to add the memberof overlay to our configuration, due to an application that did not support querying the groups for members.
I updated our configuration to load the module, and add the overlay, and proceeded to rip through all of our groups removing and then re-adding the members in order to populate the memberOf attribute on the user objects.
While doing so, there were errors logged on all of the servers:
Oct 10 04:26:09 fosse slapd[9944]: conn=75373 op=184748: memberof_value_modify DN="uid=tdnguyen1,ou=user,dc=cs upomona,dc=edu" delete memberOf="uid=classes,ou=group,dc=csupomona,dc=edu" failed err=16
This was expected, as the memberOf attribute did not exist in our current directory. However, what was unexpected was that the slapd processes started to mysteriously die while I was trying to repopulate the groups. No log messages, or any other indication of the failure, just attribute delete errors:
Oct 10 04:29:39 filmore slapd[25526]: conn=-1 op=0: memberof_value_modify DN="uid=rfu,ou=user,dc=csupomona,dc=edu" delete memberOf="uid=mhr31806,ou=group,dc=csupomona,dc=edu" failed err=16 Oct 10 04:29:39 filmore slapd[25526]: conn=-1 op=0: memberof_value_modify DN="uid=rfu,ou=user,dc=csupomona,dc=edu" delete memberOf="uid=mhr_classes,ou=group,dc=csupomona,dc=edu" failed err=16
Then the process was gone. It was definitely related to mass group updates, they would run for hours with no problems under general use, but as soon as I started churning group members, bam, one or two of them would go away.
I ended up backing out the modification, dumping the database, removing all of the memberOf attributes, and reloading it. I will try to duplicate this in a test environment with debugging enabled and see if I can get a better idea what's going on, but I was just curious if anyone had seen anything like this or knew of any underlying issues with the memberof overlay.
Thanks much.
Beste Grüße,
Michael Ströder
-- Michael Ströder Klauprechtstr. 11 Dipl.-Inform. D-76137 Karlsruhe, Germany Tel.: +49 721 8304316 Mobil: +49 170 2391920 E-Mail: michael@stroeder.com http://www.stroeder.com
From: Michael Ströder [mailto:michael@stroeder.com] Sent: Friday, October 11, 2013 1:47 PM
Could you please try to reproduce this with OpenLDAP from git repo?
It contains a fix for ITS#7710:
Once I make sure I can reliably reproduce it in a dev environment I'll give the latest checkout a try and see if it goes away, thanks
openldap-technical@openldap.org