Full_Name: Mark Cairney Version: 2.4.46 OS: Centos 7 URL: Submission from: (NULL) (129.215.149.98)
In a MMR setup of >=3 servers (4 in this case) you can cause a segfault if you issue the same modification to 2 separate servers within your replication interval. It seems like it's the last server to receive the change falls over when it receives the change from one of the other servers during replication:
2018-08-10T16:01:39.461367+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1004 op=3 BIND authcid="mcairney@EASE.ED.AC.UK" authzid="mcairney@EASE.ED.AC.UK" 2018-08-10T16:01:39.461843+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1004 op=3 BIND dn="uid=mcairney,ou=people,ou=central,dc=authorise-dev,dc=ed,dc=ac,dc=uk" mech=GSSAPI sasl_ssf=256 ssf=256 2018-08-10T16:01:39.462205+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1004 op=3 RESULT tag=97 err=0 text= 2018-08-10T16:01:39.462858+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1004 op=4 MOD dn="cn=marksgroup,ou=ug,ou=iti,ou=is,dc=authorise-dev,dc=ed,dc=ac,dc=uk" 2018-08-10T16:01:39.463268+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1004 op=4 MOD attr=member 2018-08-10T16:01:39.463703+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: slap_queue_csn: queueing 0x46a1a00 20180810150139.461113Z#000000#00c#000000 2018-08-10T16:01:39.464026+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: slap_queue_csn: queueing 0x46a0540 20180810150139.461113Z#000000#00c#000000 2018-08-10T16:01:39.464554+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: slap_graduate_commit_csn: removing 0x46a0540 20180810150139.461113Z#000000#00c#000000 2018-08-10T16:01:39.464922+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1004 op=4 RESULT tag=103 err=0 text= 2018-08-10T16:01:39.465271+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: slap_graduate_commit_csn: removing 0x46a1a00 20180810150139.461113Z#000000#00c#000000 2018-08-10T16:01:39.465647+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1004 op=5 UNBIND 2018-08-10T16:01:39.465962+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1004 fd=12 closed 2018-08-10T16:01:59.042478+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1005 fd=12 ACCEPT from IP=129.215.17.49:59638 (IP=129.215.17.52:636) 2018-08-10T16:01:59.045948+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1005 fd=12 TLS established tls_ssf=256 ssf=256 2018-08-10T16:01:59.046828+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1005 op=0 BIND dn="cn=manager,dc=authorise-dev,dc=ed,dc=ac,dc=uk" method=128 2018-08-10T16:01:59.052303+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1005 op=0 BIND dn="cn=manager,dc=authorise-dev,dc=ed,dc=ac,dc=uk" mech=SIMPLE ssf=0 2018-08-10T16:01:59.052924+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1005 op=0 RESULT tag=97 err=0 text= 2018-08-10T16:01:59.053419+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1005 op=1 SRCH base="cn=accesslog" scope=2 deref=0 filter="(&(objectClass=auditWriteObject)(reqResult=0))" 2018-08-10T16:01:59.053912+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1005 op=1 SRCH attr=reqDN reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN 2018-08-10T16:01:59.054366+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: Entry reqStart=20180810144019.000000Z,cn=accesslog CSN 20180810144019.929847Z#000000#00c#000000 older or equal to ctx 20180810144019.929847Z#000000#00c#000000 2018-08-10T16:01:59.054902+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: Entry reqStart=20180810144922.000002Z,cn=accesslog CSN 20180810144822.605802Z#000000#009#000000 older or equal to ctx 20180810150125.942413Z#000000#009#000000 2018-08-10T16:01:59.055330+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: syncprov_search_response: cookie=rid=004,sid=00c,csn=20120217162731.749366Z#000000#000#000000;20180701051000.119854Z#000000#003#000000;20180711011500.114848Z#000000#004#000000;20180710075708.402194Z#000000#005#000000;20180703112408.905121Z#000000#006#000000;20180810144822.605802Z#000000#009#000000;20180711161057.835548Z#000000#00b#000000;20180810150139.461113Z#000000#00c#000000 2018-08-10T16:01:59.055753+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1005 op=1 SEARCH RESULT tag=101 err=0 nentries=1 text= 2018-08-10T16:01:59.056308+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1005 op=2 ABANDON msg=2 2018-08-10T16:01:59.056735+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1005 op=3 SRCH base="dc=authorise-dev,dc=ed,dc=ac,dc=uk" scope=2 deref=0 filter="(objectClass=*)" 2018-08-10T16:01:59.057148+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: conn=1005 op=3 SRCH attr=* + 2018-08-10T16:02:06.066075+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: do_syncrep2: rid=002 LDAP_RES_SEARCH_RESULT 2018-08-10T16:02:06.075530+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: do_syncrep2: rid=001 cookie=rid=001,sid=009,csn=20180810150125.942413Z#000000#009#000000 2018-08-10T16:02:06.076055+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: slap_queue_csn: queueing 0x46a0380 20180810150125.942413Z#000000#009#000000 2018-08-10T16:02:06.076275+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]: syncrepl_message_to_op: rid=001 tid f8504700
The following gets logged in /var/log/messages:
2018-08-10T15:22:25.582679+01:00 spruce.authorise-dev.is.ed.ac.uk su: (to ldap) root on pts/0 2018-08-10T15:59:50.381730+01:00 spruce.authorise-dev.is.ed.ac.uk su: (to ldap) root on pts/0 2018-08-10T16:02:06.076563+01:00 spruce.authorise-dev.is.ed.ac.uk kernel: slapd[11289]: segfault at 1c ip 000000000051c6d4 sp 00007fcbf85022b0 error 4 in slapd[400000+21e000]
Even worse is that because the change is stuck in it's accesslog if you try and restart slapd it just falls over again:
2018-08-10T16:12:06.020868+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11451]: @(#) $OpenLDAP: slapd 2.4.46 (Mar 27 2018 09:23:46) $#012#011mockbuild@harmon.is.ed.ac.uk:/builddir/build/BUILD/openldap-2.4.46/servers/slapd 2018-08-10T16:12:06.052521+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11453]: slapd starting 2018-08-10T16:12:06.067909+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11453]: do_syncrep2: rid=002 cookie=rid=002,sid=00b,csn=20180810150125.942413Z#000000#009#000000 2018-08-10T16:12:06.068382+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11453]: slap_queue_csn: queueing 0x4252a40 20180810150125.942413Z#000000#009#000000 2018-08-10T16:12:06.068652+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11453]: syncrepl_message_to_op: rid=002 tid ce43f700 2018-08-10T16:12:06.068871+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11453]: do_syncrep2: rid=001 cookie=rid=001,sid=009,csn=20180810150125.942413Z#000000#009#000000
Finally you can bring the server back up if you clear out the accesslog and restart again.
The ldif to trigger it was simply: dn: cn=marksgroup,ou=ug,ou=iti,ou=is,dc=authorise-dev,dc=ed,dc=ac,dc=uk changetype: modify add: member member: uid=mcairney,ou=people,ou=central,dc=authorise-dev,dc=ed,dc=ac,dc=uk -
Applied using ldapmodify -H ldaps://<server>.authorise-dev.is.ed.ac.uk:636 -f file.ldif Deleting a member also triggers the same behaviour
Changing user attributes e.g. mail in the same manner also triggers the issue.
Adding/deleting an object doesn't cause a segfault but simply triggers a resync on each server.