openldap-bugs November 2016

openldap-bugs@openldap.org

12 participants
25 discussions

(ITS#8527) Improve logging for CSN issues
by quanah＠openldap.org 02 Nov '16

02 Nov '16

Full_Name: Quanah Gibson-Mount Version: 2.4.44 OS: N/A URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (47.208.148.26) Unfortunately, it frequently is the case that a consumer CSN ends up ahead of the provider CSN. This can occur for a variety of reasons, but in the case of databases holding multiple CSNs from the same serverID (delta-syncrepl and subordinate databases are two examples of where this can be the case), the info spit out is somewhat lacking: Oct 28 15:49:55 XXXXXXX slapd[6463]: conn=1819162 op=2 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider! It would be exceedingly useful to have better detail than this, so we can see the consumer CSN and the server CSN that are being compared. Something like: diff --git a/servers/slapd/overlays/syncprov.c b/servers/slapd/overlays/syncprov.c index f7e422d..c6ef4df 100644 --- a/servers/slapd/overlays/syncprov.c +++ b/servers/slapd/overlays/syncprov.c @@ -2654,6 +2654,11 @@ syncprov_op_search( Operation *op, SlapReply *rs ) /* our state is older, complain to consumer */ rs->sr_err = LDAP_UNWILLING_TO_PERFORM; rs->sr_text = "consumer state is newer than provider!"; + + Log4( LDAP_DEBUG_SYNC, LDAP_LEVEL_INFO, + "consumer %d state %s is newer than provider %d state %s\n", + sids[i], srs->sr_state.ctxcsn[i].bv_val, sids[j], /* == slap_serverID */ + ctxcsn[j].bv_val); bailout: if ( sop ) { syncops **sp = &si->si_ops;

1 0

Re: (ITS#8526) Slapd dies when subjected to heavy load
by michael＠stroeder.com 02 Nov '16

02 Nov '16

kwetlesen(a)mac.com wrote: > Full_Name: Kai M Wetlesen > Version: 2.4.40 > OS: RHEL 7.2 Maipo > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (198.4.83.52) > > For some reason slapd crashes with a segmentation fault when subjected to a TLS > heavy connection load. It looks like the segmentation fault originates in > libnss3.so, It seems you're using the binary RPMs provided by RHEL. Those are linked to libnss and are furthermore heavily patched by Red Hat. Assuming you have a RHEL support contract you better file a RHEL bug report first. Ciao, Michael.

1 0

Re: (ITS#8125) MMR throws away valid changes causing drift
by quanah＠symas.com 01 Nov '16

01 Nov '16

--On Friday, May 15, 2015 7:53 PM +0000 quanah(a)zimbra.com wrote: Capturing what was discussed of this issue in IRC at the time: <hyc> too bad. yes, there is a case where a refresh can delete entries that shouldn't have been deleted <hyc> we discussed it here a month or so ago <hyc> but if all servers were in persist phase with delta, that shouldn't happen <JoBbZ> hm.. should have been out of refresh <JoBbZ> it did a refresh @ zimbra.log.30:May 1 14:46:02 zm-mmr01 slapd[43243]: do_syncrep2: rid=101 LDAP_RES_INTERMEDIATE - REFRESH_DELETE <JoBbZ> but that was hours before <hyc> and it was also behind by 1.5 hours <hyc> ... <JoBbZ> is there a message logged when refresh mode ends? <hyc> that's the message. <hyc> at least, that is the message from the provider that it's done sending refresh messages <hyc> then the consumer runs syncrepl_del_nonpresent based on the provider messages <JoBbZ> zimbra.log.30:May 1 18:37:33 zm-mmr01 slapd[43243]: syncrepl_del_nonpresent: rid=102 be_delete uid=skoelzer,ou=people,dc=unitedfiber,dc=email (0) <JoBbZ> to <JoBbZ> zimbra.log.30:May 1 18:37:35 zm-mmr01 slapd[43243]: syncrepl_del_nonpresent: rid=101 be_delete uid=fsodell2,ou=people,dc=ntin,dc=net (32) <JoBbZ> why would refresh mode end @ 14:146, and then 4 hours later, it does del_nonpresent? <hyc> it might have had a lot of refresh messages to parse <hyc> but 4 hours seems excessive, yeah <JoBbZ> but it was replicating during that time too <JoBbZ> May 1 15:00:00 zm-mmr01 slapd[43243]: syncrepl_message_to_op: rid=102 be_modify uid=gibbonsj,ou=people,dc=bevcomm,dc=net (0) <JoBbZ> for example <JoBbZ> but thousands and thousands <hyc> that's rid=102 <hyc> you were looking at rid=101 <JoBbZ> May 1 18:42:18 zm-mmr01 slapd[43243]: do_syncrep2: rid=102 LDAP_RES_INTERMEDIATE - REFRESH_DELETE <JoBbZ> May 1 18:42:20 zm-mmr01 slapd[43243]: do_syncrep2: rid=101 LDAP_RES_INTERMEDIATE - REFRESH_DELETE <JoBbZ> they both finished around the same time <hyc> this is 4way so where's the 3rd rid <JoBbZ> May 1 14:46:02 zm-mmr01 slapd[43243]: do_syncrep2: rid=100 LDAP_RES_INTERMEDIATE - REFRESH_DELETE <JoBbZ> hm <JoBbZ> this is interesting.. <JoBbZ> May 1 14:18:45 zm-mmr01 slapd[36133]: do_syncrep2: rid=100 LDAP_RES_INTERMEDIATE - REFRESH_DELETE <JoBbZ> May 1 14:19:59 zm-mmr01 slapd[36133]: do_syncrep2: rid=101 LDAP_RES_INTERMEDIATE - REFRESH_DELETE <JoBbZ> May 1 14:19:59 zm-mmr01 slapd[36133]: do_syncrep2: rid=102 LDAP_RES_INTERMEDIATE - REFRESH_DELETE <JoBbZ> so it did one sync then <JoBbZ> May 1 14:46:02 zm-mmr01 slapd[43243]: do_syncrep2: rid=100 LDAP_RES_INTERMEDIATE - REFRESH_DELETE <JoBbZ> May 1 14:46:02 zm-mmr01 slapd[43243]: do_syncrep2: rid=102 LDAP_RES_INTERMEDIATE - REFRESH_DELETE <JoBbZ> May 1 14:46:02 zm-mmr01 slapd[43243]: do_syncrep2: rid=101 LDAP_RES_INTERMEDIATE - REFRESH_DELETE <JoBbZ> another sync then <JoBbZ> a sync here <JoBbZ> May 1 18:37:34 zm-mmr01 slapd[43243]: do_syncrep2: rid=102 LDAP_RES_INTERMEDIATE - REFRESH_DELETE <JoBbZ> May 1 18:37:35 zm-mmr01 slapd[43243]: do_syncrep2: rid=101 LDAP_RES_INTERMEDIATE - REFRESH_DELETE <JoBbZ> and one at <JoBbZ> May 1 18:42:18 zm-mmr01 slapd[43243]: do_syncrep2: rid=102 LDAP_RES_INTERMEDIATE - REFRESH_DELETE <JoBbZ> May 1 18:42:20 zm-mmr01 slapd[43243]: do_syncrep2: rid=101 LDAP_RES_INTERMEDIATE - REFRESH_DELETE <JoBbZ> ugh <hyc> no rid=100 from the last 2? <JoBbZ> nope <JoBbZ> let me go look at the rid assignments rq <hyc> that's suspicious <JoBbZ> hm.. i bet it is losing connections <JoBbZ> 101/102 are in nashville <JoBbZ> 100 is in greenville, along with this server <JoBbZ> yeah, full fallback triggered at 18:37:04 <JoBbZ> sigh <JoBbZ> but this was a modrdn and an account add <JoBbZ> no deletes <hyc> was the Add the missing entry? <JoBbZ> yeah <hyc> yeah, that's the known issue then <JoBbZ> ok, what about the modrdn? <hyc> which isn't in an ITS so we should probably use the one you just filed <hyc> haven't thought about the modrdn but it's probably the same thing. <JoBbZ> ok <hyc> in refresh without syncprov sessionlog, the provider has to send the consumer a list of entryUUIDs of every entry that exists on the provider <hyc> the consumer then searches every entry in the consumer, compares to the list of entryUUIDs from the provider, and deletes any entry that's not in the provider's list <hyc> the problem is that an entry can be missing from this particular provider's list because its update was coming from a different provider <hyc> so it gets deleted even though it was properly received from the other provider <JoBbZ> ok, makes sense <hyc> if you have a syncprov sessionlog that's large enough to cover the span of disconnected time <hyc> then the provider sends a list of deleted entryUUIDs instead of present UUIDs <hyc> and this bug can't happen then --Quanah -- Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: <http://www.symas.com>

1 0

Re: (ITS#8526) Slapd dies when subjected to heavy load
by hyc＠symas.com 01 Nov '16

01 Nov '16

kwetlesen(a)mac.com wrote: > Full_Name: Kai M Wetlesen > Version: 2.4.40 > OS: RHEL 7.2 Maipo > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (198.4.83.52) > > > Hi All, > > For some reason slapd crashes with a segmentation fault when subjected to a TLS > heavy connection load. It looks like the segmentation fault originates in > libnss3.so, but I cannot tell anything more than that. Issues with OpenLDAP builds shipped in RHEL must be directed to RedHat support. The OpenLDAP Project does not endorse the use of libnss. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

(ITS#8526) Slapd dies when subjected to�heavy load
by kwetlesen＠mac.com 01 Nov '16

01 Nov '16

Full_Name: Kai M Wetlesen Version: 2.4.40 OS: RHEL 7.2 Maipo URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (198.4.83.52) Hi All, For some reason slapd crashes with a segmentation fault when subjected to a TLS heavy connection load. It looks like the segmentation fault originates in libnss3.so, but I cannot tell anything more than that. AOAOct 28 11:48:57 ldap-primary.domain slapd[31842]: conn=2259 fd=26 closed (TLS negotiation failure) Oct 28 11:48:57 ldap-primary.domain slapd[31842]: conn=2260 fd=26 ACCEPT from IP=XXX.XXX.205.98:49696 (IP=0.0.0.0:636) (about 800 more lines like this)AOAOct 28 11:53:51 ldap-primary.domain slapd[31842]: conn=2671 fd=33 ACCEPT from IP=XXX.XXX.205.98:60921 (IP=0.0.0.0:636) Oct 28 11:53:51 ldap-primary.domain slapd[31842]: conn=2670 fd=34 closed (TLS negotiation failure) Oct 28 11:53:51 ldap-primary.domain slapd[31842]: conn=2672 fd=34 ACCEPT from IP=XXX.XXX.205.98:60926 (IP=0.0.0.0:636) Oct 28 11:53:51 ldap-primary.domain slapd[31842]: conn=2672 fd=34 closed (TLS negotiation failure) Oct 28 11:53:51 ldap-primary.domain kernel: slapd[32180]: segfault at 10 ip 00007f83554fcc65 sp 00007f83367fc550 error 4 in libnss3.so[7f83554b6000+11e000] Oct 28 11:53:51 ldap-primary.domain systemd[1]: slapd.service: main process exited, code=killed, status=11/SEGV The traffic originates from a penetration test machine running Nessus which is used where this server resides as part of a security sweep. Unfortunately I don't have visibility as to what exact tests the Nessus server performs, but I do know that the probes the server a couple hundred times to try and discover what service is running. The machine is running on a lightly configured but dedicated VM as this server was never expected to serve more than 400 clients. Is this expected behavior? Steps to reproduce: - Install OpenLDAP - Configure any DIT - Configure OpenLDAP only to service ldaps:// using TLS - Start the server - Confgure Nessus scanner - Run a Nessus vulnerability scan against the server Thanks, Kai Wetlesen

1 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

openldap-bugs November 2016