openldap-bugs September 2016

openldap-bugs@openldap.org

17 participants
49 discussions

(ITS#8493) Under heavy modrdn load, masters desync
by quanah＠openldap.org 02 Sep '16

02 Sep '16

Full_Name: Quanah Gibson-Mount Version: 2.4.44+ITS8432 OS: Linux 2.6 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (75.111.52.177) Trying to reproduce another ITS, I discovered a new bug. When doing MODRDN ops on one master, the other master keeps going out of sync. Specifically: Sep 3 01:12:17 zre-ldap002 slapd[29206]: syncrepl_message_to_op: rid=100 be_modrdn uid=user.924,ou=people,dc=zre-ldap002,dc=eng,dc=zimbra,dc=com (32) Sep 3 01:12:17 zre-ldap002 slapd[29206]: do_syncrep2: rid=100 delta-sync lost sync on (reqStart=20160903051215.747829Z,cn=accesslog), switching to REFRESH

1 0

RE: (ITS#8492) signal 11 in ldap_new_connection
by quanah＠zimbra.com 02 Sep '16

02 Sep '16

--On Friday, September 02, 2016 7:35 AM +0000 htjain(a)sandvine.com wrote: > After some more debugging looks like ld->ldc->ldc_options.ldo_defludp > varia= ble is causing this issue, its null when its passed to > ldap_new_connection. How does it populated? Please confirm your encounter the issue with a current release. 2.4.40 and 2.4.41 both had significant issues. --Quanah -- Quanah Gibson-Mount

1 0

RE: (ITS#8492) signal 11 in ldap_new_connection
by htjain＠sandvine.com 01 Sep '16

01 Sep '16

After some more debugging looks like ld->ldc->ldc_options.ldo_defludp varia= ble is causing this issue, its null when its passed to ldap_new_connection. How does it populated? (gdb) p ld->ldc->ldc_options $13 =3D {ldo_valid =3D 255, ldo_debug =3D 0, ldo_version =3D 3, ldo_deref = =3D 0, ldo_timelimit =3D 0, ldo_sizelimit =3D 0, ldo_tm_api =3D {tv_sec =3D= -1, tv_usec =3D 0}, ldo_tm_net =3D {tv_sec =3D 5, tv_usec =3D 0}, ldo_defl= udp =3D 0x0, ldo_defport =3D 389, ldo_defbase =3D 0x0, ldo_defbinddn =3D 0x0, ldo_keepalive_idle =3D 0, ldo= _keepalive_probes =3D 0, ldo_keepalive_interval =3D 0, ldo_refhoplimit =3D = 5, ldo_sctrls =3D 0x0, ldo_cctrls =3D 0x0, ldo_rebind_proc =3D 0x0, ldo_reb= ind_params =3D 0x0, ldo_nextref_proc =3D 0x0, ldo_nextref_params =3D 0x0, ldo_urllist_proc = =3D 0x0, ldo_urllist_params =3D 0x0, ldo_conn_cbs =3D 0x0, ldo_booleans =3D= 34, ldo_peer =3D 0x0, ldo_cldapdn =3D 0x0, ldo_is_udp =3D 0, ldo_tls_ctx = =3D 0x0, ldo_tls_connect_cb =3D 0x0, ldo_tls_connect_arg =3D 0x0, ldo_tls_info =3D= {lt_certfile =3D 0x0, lt_keyfile =3D 0x0, lt_dhfile =3D 0x0, lt_cacertfile= =3D 0x0, lt_cacertdir =3D 0x0, lt_ciphersuite =3D 0x0, lt_crlfile =3D 0x0,= lt_randfile =3D 0x0, lt_protocol_min =3D 0}, ldo_tls_mode =3D 0, ldo_tls_require_cert =3D 2,= ldo_tls_impl =3D 0, ldo_tls_crlcheck =3D 0, ldo_def_sasl_mech =3D 0x0, ldo= _def_sasl_realm =3D 0x0, ldo_def_sasl_authcid =3D 0x0, ldo_def_sasl_authzid= =3D 0x0, ldo_sasl_secprops =3D {min_ssf =3D 0, max_ssf =3D 2147483647, maxbufsize = =3D 16777215, security_flags =3D 17, property_names =3D 0x0, property_value= s =3D 0x0}} (gdb) p ld->ldc->ldc_options.ldo_defludp $14 =3D (LDAPURLDesc *) 0x0 -----Original Message----- From: openldap-its(a)OpenLDAP.org [mailto:openldap-its@OpenLDAP.org]=20 Sent: Friday, September 2, 2016 11:50 AM To: Hemant Jain Subject: Re: (ITS#8492) signal 11 in ldap_new_connection *** THIS IS AN AUTOMATICALLY GENERATED REPLY *** Thanks for your report to the OpenLDAP Issue Tracking System. Your report = has been assigned the tracking number ITS#8492. One of our support engineers will look at your report in due course. Note that this may take some time because our support engineers are volunte= ers. They only work on OpenLDAP when they have spare time. If you need to provide additional information in regards to your issue repo= rt, you may do so by replying to this message. Note that any mail sent to = openldap-its(a)openldap.org with (ITS#8492) in the subject will automatically= be attached to the issue report. mailto:openldap-its@openldap.org?subject=3D(ITS#8492) You may follow the progress of this report by loading the following URL in = a web browser: http://www.OpenLDAP.org/its/index.cgi?findid=3D8492 Please remember to retain your issue tracking number (ITS#8492) on any furt= her messages you send to us regarding this report. If you don't then you'l= l just waste our time and yours because we won't be able to properly track = the report. Please note that the Issue Tracking System is not intended to be used to se= ek help in the proper use of OpenLDAP Software. Such requests will be closed. OpenLDAP Software is user supported. http://www.OpenLDAP.org/support/ -------------- Copyright 1998-2007 The OpenLDAP Foundation, All Rights Reserved.

1 0

(ITS#8492) signal 11 in ldap_new_connection
by htjain＠sandvine.com 01 Sep '16

01 Sep '16

Full_Name: hemant jain Version: openldap-2.4.40-8 OS: RHEL 7.2 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (123.63.202.166) Hi, We are facing a strange core dump in ldap_new_connection function while trying to bind a stopped Openldap server. following is the stack trace: #0 0x00007f9336898f53 in ldap_new_connection (ld=ld@entry=0x7f931f43cd60, srvlist=srvlist@entry=0x7f931f48b298, use_ldsb=use_ldsb@entry=1, connect=connect@entry=1, bind=bind@entry=0x0, m_req=m_req@entry=0, m_res=m_res@entry=0) at request.c:481 #1 0x00007f933688530f in ldap_open_defconn (ld=ld@entry=0x7f931f43cd60) at open.c:41 #2 0x00007f933689a248 in ldap_send_initial_request (ld=ld@entry=0x7f931f43cd60, msgtype=msgtype@entry=96, dn=dn@entry=0x7f931f827718 "cn=Manager,dc=test,dc=com", ber=ber@entry=0x7f931f457150, msgid=msgid@entry=1) at request.c:130 #3 0x00007f933688f178 in ldap_sasl_bind (ld=0x7f931f43cd60, dn=0x7f931f827718 "cn=Manager,dc=test,dc=com", mechanism=<optimized out>, cred=0x7f931f4296b0, sctrls=0x0, cctrls=<optimized out>, msgidp=0x7f92243ed8dc) at sasl.c:148 #4 0x00007f9340f571da in ldap::Connection::Bind (this=this@entry=0x7f931c043478, msg_id=@0x7f92243ed98c: -1) at support/ldapconnection.cpp I have installed corresponding debug rpm for Openldap library. Following the source info. core dump is coming at line 481 of request.c file.A%A for ( srvp = srvlist; *srvp != NULL; srvp = &(*srvp)->lud_next ) // Here srvp is NULL so *srvp dumping core. if ( connect ) { LDAPURLDesc **srvp, *srv = NULL; async = LDAP_BOOL_GET( &ld->ld_options, LDAP_BOOL_CONNECT_ASYNC ); for ( srvp = srvlist; *srvp != NULL; srvp = &(*srvp)->lud_next ) { int rc; rc = ldap_int_open_connection( ld, lc, *srvp, async ); if ( rc != -1 ) { srv = *srvp; if ( ld->ld_urllist_proc && ( !async || rc != -2 ) ) { ld->ld_urllist_proc( ld, srvlist, srvp, ld->ld_urllist_params ); } break; } } Any idea how to fix it ?

1 0

(ITS#8491) slapdelete test needed
by quanah＠openldap.org 01 Sep '16

01 Sep '16

Full_Name: Quanah Gibson-Mount Version: RE25 OS: Linux 2.6 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (75.111.52.177) We need to add a test script for slapdelete for 2.5 alpha

1 0

Re: (ITS#8490) changes not written to accesslog, causing replicas to loop syncing
by quanah＠zimbra.com 01 Sep '16

01 Sep '16

--On Thursday, September 01, 2016 8:58 AM +0000 quanah(a)zimbra.com wrote: > In looking at the LDAP accesslog, what I see is that what should have > been a modRDN op was stored in the accesslog as a MOD op (the one I > noted before). This seems particularly bizarre, because ldap01 should > have rejected this change as well. It appears we may have a problem > where the accesslog DB is updated, but then the change got rejected by > the unique overlay. Ok, actually the server (ldap01) accepted the change (renamed the entry). Looking at the database, it was correctly done as a MODRDN to ldap01, but /written/ as a MOD to accesslog?! --Quanah -- Quanah Gibson-Mount

1 0

Re: (ITS#8490) changes not written to accesslog, causing replicas to loop syncing
by quanah＠zimbra.com 01 Sep '16

01 Sep '16

--On Thursday, September 01, 2016 8:05 AM +0000 quanah(a)zimbra.com wrote: > --On Thursday, September 01, 2016 7:52 AM +0000 quanah(a)openldap.org wrote: > >> Full_Name: Quanah Gibson-Mount >> Version: OpenLDAP 2.4.44 >> OS: Linux 2.6 >> URL: ftp://ftp.openldap.org/incoming/ >> Submission from: (NULL) (75.111.52.177) >> >> >> In a 2-node MMR setup. Node 1 is getting a lot of write traffic. Both >> node 1 and node 2 have 3 replicas each. At some point, a change is >> received by node 1, which writes the change to its accesslog DB and its >> primary DB. It's 3 replicas are all correctly updated. MMR node 2 >> receives the change, updates its primary DB, but *fails* to write the >> change to the accesslog DB. However, it *does* write the CSN update to >> the accesslog DB successfully. This causes all of its replicas to also >> update their CSN. Then a change comes in triggering a constraint >> violation on the replicas, but fully accepted by their master. > > So the above summary is incorrect. While 3 replicas did go out of > sync... 2 belonged to the primary master (node1), and 1 belonged to the > secondary master (node 2). So really, 4 systems didn't log the change > (MMR node 2, ldap05, ldap07, ldap09). Ok, so that's not correct either. I now have the correct topography: ldap01 has the following replicas: ldap02, ldap05, ldap07, ldap09 ldap02 has the following replicas: ldap01, ldap06, ldap08, ldap10 So the replicas of ldap01 received the change and rejected it. ldap02 just skipped writing the entry to the accesslog, and as a result, none of its replicas ever got the change, and thus they never hit the failure issue of err 19, but they all are now lacking this modification entirely. I would note that every server was loaded today from the same ldap backup, so they were all perfectly in sync. In looking at the LDAP accesslog, what I see is that what should have been a modRDN op was stored in the accesslog as a MOD op (the one I noted before). This seems particularly bizarre, because ldap01 should have rejected this change as well. It appears we may have a problem where the accesslog DB is updated, but then the change got rejected by the unique overlay. --Quanah -- Quanah Gibson-Mount

1 0

Re: (ITS#8490) changes not written to accesslog, causing replicas to loop syncing
by quanah＠zimbra.com 01 Sep '16

01 Sep '16

--On Thursday, September 01, 2016 7:52 AM +0000 quanah(a)openldap.org wrote: > Full_Name: Quanah Gibson-Mount > Version: OpenLDAP 2.4.44 > OS: Linux 2.6 > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (75.111.52.177) > > > In a 2-node MMR setup. Node 1 is getting a lot of write traffic. Both > node 1 and node 2 have 3 replicas each. At some point, a change is > received by node 1, which writes the change to its accesslog DB and its > primary DB. It's 3 replicas are all correctly updated. MMR node 2 > receives the change, updates its primary DB, but *fails* to write the > change to the accesslog DB. However, it *does* write the CSN update to > the accesslog DB successfully. This causes all of its replicas to also > update their CSN. Then a change comes in triggering a constraint > violation on the replicas, but fully accepted by their master. So the above summary is incorrect. While 3 replicas did go out of sync... 2 belonged to the primary master (node1), and 1 belonged to the secondary master (node 2). So really, 4 systems didn't log the change (MMR node 2, ldap05, ldap07, ldap09). So I have no explanation at the moment why ldap06, ldap08, and ldap10 were not affected. --Quanah -- Quanah Gibson-Mount

1 0

(ITS#8490) changes not written to accesslog, causing replicas to loop syncing
by quanah＠openldap.org 31 Aug '16

31 Aug '16

Full_Name: Quanah Gibson-Mount Version: OpenLDAP 2.4.44 OS: Linux 2.6 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (75.111.52.177) In a 2-node MMR setup. Node 1 is getting a lot of write traffic. Both node 1 and node 2 have 3 replicas each. At some point, a change is received by node 1, which writes the change to its accesslog DB and its primary DB. It's 3 replicas are all correctly updated. MMR node 2 receives the change, updates its primary DB, but *fails* to write the change to the accesslog DB. However, it *does* write the CSN update to the accesslog DB successfully. This causes all of its replicas to also update their CSN. Then a change comes in triggering a constraint violation on the replicas, but fully accepted by their master. Node 1 records the following change in its accesslog DB: dn: reqStart=20160901025119.904382Z,cn=accesslog objectClass: auditModify structuralObctctClass: auditModify reqStart: 20160901025119.904382Z reqEnd: 20160901025119.906625Z reqType: modify reqSession: 7780 reqAuthzID: uid=zimbra,cn=admins,cn=zimbra reqDN: uid=renamed-trackp,ou=people,dc=xxxxxxxx,dc=co,dc=za reqResult: 0 reqM%3: zimbraMailWhitelistMaxNumEntries:= 0 reqMod: zimbraPrefMailLocalDeliveryDisabled:= FALSE reqMod: zimbraHideInGal:= TRUE reqMod: userPassword:= {SSHA} reqMod: mail:= renamed-trackp(a)xxxxxxx.co.za reqMod: displayName:= trackp(a)xxxxxxxx.co.za reodod: objectClass:= inetOrgPerson reqMod: objectClass:= zimbraAccount reqMod: objectClass:= amavisAccount reqMod: zimbraMailHost:= store328-dc01.cm.xxxxxxxxxxx reqMod: zimbraPasswordMustChange:= FALSE reqMod: cn:= trackp(a)xxxxxxxxxxxxx.co.za reqMod: zimbraPasswordModifiedTime:= 20160121142933Z reqMod: zimbraMailStatus:= enabled reqMod: zimbraIsDelegatedAdminAccount:= FALSE reqMod: zimbraPrefOutOfOfficeReplyEnabled:= FALSE reqMod: zimbraMailBlacklistMaxNumEntries:= 0 reqMod: zimbraAccountStatus:= locked reqMod: zimbraPrefTimeZoneId:= Africa/Harare reqMod: zimbraId:= 403c7b67-eea0-4315-9b10-a2853c6cf70b reqMod: zimbraMailTransport:= lmtp:store328-dc01.cm.xxxxxxxxxx:7025 reqMod: sn:= trackp(a)xxxxxxxxxxx.co.za reqMod: zimbraCOSId:= 1daeb979-365f-46b7-bb3f-2ad46f4576c1 reqMod: zimbraCreateTimestamp:= 20160121142933Z reqMod: zimbraMailDeliveryAddress:= renamed-trackp(a)xxxxxxx.co.za reqMod: entryCSN:= 20160901025119.904589Z#000000#002#000000 reqMod: modifiersName:= uid=zimbra,cn=admins,cn=zimbra reqMod: modifyTimestamp:= 20160901025119Z reqEntryUUID: 2293af38-5497-1035-8bd6-1d34a55b9433 entryUUID: b46d8728-043a-1036-8ab7-9bf2e127fe3b creatorsName: cn=config createTimestamp: 20160901025119Z entryCSN: 20160901025119.904589Z#000000#002#000000 modifiersName: cn=config modifyTimestamp: 20160901025119Z On the other master, we find in the accesslog two sequential CSN updates: dn: reqStart=20160901025119.904218Z,cn=accesslog objectClass: auditModify structuralObjectClass: auditModify reqStart: 20160901025119.904218Z reqEnd: 20160901025119.905035Z reqType: modify reqSession: 201 reqAuthzID: cn=config reqDN: reqResult: 0 reqMod: contextCSN:= 20160901025119.902062Z#000000#002#000000 reqMod: contextCSN:= 20160901015920.689225Z#000000#003#000000 reqEntryUUID: 78143fa2-7af5-48cc-8d5e-6e15f68dd3b2 entryUUID: b46d48a8-043a-1036-81a5-b3dead5e3175 creatorsName: cn=config createTimestamp: 20160901025119Z entryCSN: 20160901025119.902062Z#000000#002#000000 modifiersName: cn=config modifyTimestamp: 20160901025119Z dn: reqStart=20160901025119.908257Z,cn=accesslog objectClass: auditModify structuralObjectClass: auditModify reqStart: 20160901025119.908257Z reqEnd: 20160901025119.908439Z reqType: modify reqSession: 201 reqAuthzID: cn=config reqDN: reqResult: 0 reqMod: contextCSN:= 20160901025119.904589Z#000000#002#000000 reqMod: contextCSN:= 20160901015920.689225Z#000000#003#000000 reqEntryUUID: 78143fa2-7af5-48cc-8d5e-6e15f68dd3b2 entryUUID: b46dcdaa-043a-1036-81a6-b3dead5e3175 creatorsName: cn=config createTimestamp: 20160901025119Z entryCSN: 20160901025119.904589Z#000000#002#000000 modifiersName: cn=config modifyTimestamp: 20160901025119Z Note the CSN update in the second case is for 20160901025119.904589Z. Note that this was the CSN of the change that in node 1 was a real update. The atribute with a constraint in this case is ZimbraID, which is colliding with the zimbraID of the original entry it was renamed from.

1 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

openldap-bugs September 2016