openldap-bugs June 2017

openldap-bugs@openldap.org

22 participants
46 discussions

(ITS#8673) Need duration logging for sync ops
by quanah＠openldap.org 08 Jun '17

08 Jun '17

Full_Name: Quanah Gibson-Mount Version: 2.4.45 OS: Linux URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (47.208.148.239) We log the duration of an operation at stats level which allows being able to see how long a write OP took on a master. However we do not have any duration logging at the sync log level and therefore cannot determine how long a write op takes on a replica for the same modification. This is problematic when attempting to debug issues related to replication.

1 0

Re: (ITS#8444) Out-of-sync issue with memberOf overlay, Delta-syncrepl MMR and >2 nodes
by okuznik＠symas.com 08 Jun '17

08 Jun '17

A more self-contained log of the same issue, available at ftp://ftp.openldap.org/incoming/its8444.log (line numbers below are against current master, commit 91f4d3a6b75e73bf4ea498e83e2e4cb4e7a320e0) There are some things that occur in all the failures I have seen so far: - the server that received the operation (#1) sends the accesslog entry with no CSN in the cookie, then another provider (#2) picks up this message and relays it to its consumers, this one with a CSN in the cookie - a consumer picks up these two in short succession, in the log above, processing of the one from #2 is finished first (they are being processed concurrently) Usually, once one of them gets processed, the new CSN should be noted and the other threads should just skip it (syncrepl.c:943 and onwards). In this one, having no CSN in the cookie seems to allow both to process so far as to run syncrepl_message_to_op(), and one of them will then inevitably fail to add the entry. I don't understand yet why server #1 sends the operations without a CSN in the cookie and (especially if I reorder the overlays to set up memberof last), the race goes the other way around and the operation to fail is the one from server #2. My take on it was that in a delta-sync environment all entries would be passed with a new CSN and that should end up in the cookie, allowing syncrepl.c:986 to do its job. --=20 Ond=C5=99ej Kuzn=C3=ADk Senior Software Engineer Symas Corporation http://www.symas.com Packaged, certified, and supported LDAP solutions powered by OpenLDAP

1 0

Re: (ITS#8672) syncrepl with openldap 2.4.{40,42} and mdb backend
by quanah＠symas.com 08 Jun '17

08 Jun '17

--On Thursday, June 08, 2017 2:06 PM +0000 remy.dernat(a)umontpellier.fr wrote: > Best regards, > R?my Hi, Thanks for the report. Please confirm you can reproduce the issue with OpenLDAP 2.4.45. If you can, then please provide your /initial/ cn=config database prior to doing the modifications as well. Thanks! --Quanah -- Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: <http://www.symas.com>

1 0

(ITS#8672) syncrepl with openldap 2.4.{40,42} and mdb backend
by remy.dernat＠umontpellier.fr 08 Jun '17

08 Jun '17

Full_Name: Dernat R.my Version: 2.4.40+dfsg-1+deb8u3 and 2.4.42+dfsg-2ubuntu3.2 OS: Debian and Ubuntu URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (162.38.181.76) Hi, Since I moved my OpenLDAP to another server my replication between 2 ldap servers through syncrepl does not work anymore. I tested many many things. Finally, I decided to backup the database and restore it on another server (so, I have 3 ldap servers) and (...) it worked. After many other tests, I was able to determine the source of this issue. With a HDB backend on my provider my replication works, while it did not work with a MDB backend on the provider. I had this kind of logs on the provider (with MDB): ============================================================================= Jun 8 12:22:03 ldap2 slapd[15083]: send_search_entry: conn 20855 ber write failed. Jun 8 12:24:03 ldap2 slapd[15083]: send_search_entry: conn 20888 ber write failed. ... ============================================================================= While, on the slave, I get: ============================================================================= Jun 8 09:33:32 ldap3-bis slapd[88560]: do_syncrepl: rid=010 rc -1 retrying Jun 8 09:38:32 ldap3-bis slapd[88560]: do_syncrep2: rid=010 got search entry without Sync State control (dc=my,dc=domain,dc=com) Jun 8 09:38:32 ldap3-bis slapd[88560]: do_syncrepl: rid=010 rc -1 retrying Jun 8 09:43:32 ldap3-bis slapd[88560]: do_syncrep2: rid=010 got search entry without Sync State control (dc=my,dc=domain,dc=com) ... ============================================================================= I am able to reproduce the bug quite easily. I added only two schemas : autofs and quota. With ============================================================================= ldapadd -Q -Y EXTERNAL -H ldapi:/// -f autofs.ldif ldapadd -Q -Y EXTERNAL -H ldapi:/// -f quota.ldif ============================================================================= I also loaded accesslog module (I am creating a specific directory for accesslog(*)) on the provider and the syncprov module on both sides. ============================================================================= (*) mdkir /var/lib/ldap/accesslog chown openldap:openldap /var/lib/ldap/accesslog ============================================================================= Here is what I am doing to setup the syncrepl : ldapadd -Q -Y EXTERNAL -H ldapi:/// -f file.ldif With file.ldif, on the provider site (replication.ldif ; replacing HDB with MDB to test with a MDB backend): ============================================================================= #Load the syncprov and accesslog modules. dn: cn=module{0},cn=config changetype: modify add: olcModuleLoad olcModuleLoad: syncprov - add: olcModuleLoad olcModuleLoad: accesslog # Accesslog database definitions dn: olcDatabase={2}hdb,cn=config objectClass: olcDatabaseConfig objectClass: olcHdbConfig olcDatabase: {2}hdb olcDbDirectory: /var/lib/ldap/accesslog olcSuffix: cn=accesslog olcRootDN: cn=XXXX,dc=YYYY,dc=ZZ olcDbIndex: default eq olcDbIndex: entryCSN,objectClass,reqEnd,reqResult,reqStart # Accesslog db syncprov. dn: olcOverlay=syncprov,olcDatabase={2}hdb,cn=config changetype: add objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: syncprov olcSpNoPresent: TRUE olcSpReloadHint: TRUE # syncrepl Provider for primary db dn: olcOverlay=syncprov,olcDatabase={1}hdb,cn=config changetype: add objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: syncprov olcSpNoPresent: TRUE # accesslog overlay definitions for primary db dn: olcOverlay=accesslog,olcDatabase={1}hdb,cn=config objectClass: olcOverlayConfig objectClass: olcAccessLogConfig olcOverlay: accesslog olcAccessLogDB: cn=accesslog olcAccessLogOps: writes olcAccessLogSuccess: TRUE # scan the accesslog DB every day, and purge entries older than 7 days olcAccessLogPurge: 07+00:00 01+00:00 ============================================================================= On the consumer (with a unique rid, and by replacing HDB with MDB to test with a MDB backend), the file.ldif looks like: ============================================================================= #Load the syncprov module. dn: cn=module{0},cn=config changetype: modify add: olcModuleLoad olcModuleLoad: syncprov # syncrepl specific indices dn: olcDatabase={1}hdb,cn=config changetype: modify add: olcDbIndex olcDbIndex: entryUUID eq - add: olcSyncRepl olcSyncRepl: rid=1 provider=ldaps://consumer.mydomain.fr bindmethod=simple binddn="cn=XXXXX,dc=mydomain,dc=fr" credentials=XXXX searchbase="dc=mydomain,dc=fr" logbase="cn=accesslog" logfilter="(&(objectClass=auditWriteObject)(reqResult=0)) schemachecking=off type=refreshAndPersist retry="60 +" syncdata=accesslog - add: olcUpdateRef olcUpdateRef: ldaps://consumer.mydomain.fr ============================================================================= On the provider I am using: ============================================================================= lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 8.8 (jessie) Release: 8.8 Codename: jessie dpkg -l slapd ii slapd 2.4.40+dfsg-1+de amd64 OpenLDAP server (slapd) ============================================================================= Same configuration on one slave, and on the other slave, I am using: ============================================================================= Distributor ID: Ubuntu Description: Ubuntu 16.04.2 LTS Release: 16.04 Codename: xenial dpkg -l slapd ii slapd 2.4.42+dfsg-2ubu amd64 OpenLDAP server (slapd) ============================================================================= Best regards, Rémy

1 0

Re: ITS#8504 fix breaks Solaris builds
by quanah＠symas.com 07 Jun '17

07 Jun '17

--On Wednesday, June 07, 2017 4:11 PM +0000 quanah(a)symas.com wrote: > --On Wednesday, June 07, 2017 3:35 PM +0200 Hallvard Breien Furuseth > <h.b.furuseth(a)usit.uio.no> wrote: > >> On 06. juni 2017 21:59, quanah(a)openldap.org wrote: >>> /usr/include/signal.h:233:12: note: declared here >>> extern int sigwait(sigset_t *); >>> ^ >> >> POSIX says it has two arguments: >> http://pubs.opengroup.org/onlinepubs/9699919799/functions/sigwait.html >> Is there a two-argument version of sigwait in the Solaris headers >> somewhere, >> maybe enabled by #define POSIX_C_SOURCE 1 or something like that? > > It's insanely long.... This patch fixed it on my box: quanah@sol11-3:~/git/sold-2445/openldap$ git diff diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c index 8a62eff..1976340 100644 --- a/libraries/liblmdb/mdb.c +++ b/libraries/liblmdb/mdb.c @@ -113,6 +113,9 @@ typedef SSIZE_T ssize_t; /* Most platforms have posix_memalign, older may only have memalign */ #define HAVE_MEMALIGN 1 #include <malloc.h> +#if defined (__sun) +#define _POSIX_PTHREAD_SEMANTICS 1 +#endif #endif #if !(defined(BYTE_ORDER) || defined(__BYTE_ORDER)) --Quanah -- Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: <http://www.symas.com>

1 0

Re: (ITS#8669) Slapd service becomes unresponsive intermittently
by quanah＠symas.com 07 Jun '17

07 Jun '17

--On Wednesday, June 07, 2017 8:44 PM +0200 Michael Str=C3=B6der=20 <michael(a)stroeder.com> wrote: > quanah(a)symas.com wrote: >>> Is it possible we have the idletimeout set too high and it should be >>> lowered? I=3DC2=3DB9m wondering if there is some sweet-spot value for = this >>> particular setting. >> >> I generally leave it unset unless one is encountering an issue of = running >> out of connections. Generally, it would be fairly strange for >> idletimeout to affect things this way at all. > > I generally recommend to set idletimeout even somewhat tight in case you > don't have a strictly defined set of clients. Because a client > application which does not use its LDAP connection for ~5 min. is most > times simply not closing connections. And running out of file handles can > affect all file creation on your system (e.g. creating BDB's transaction > log files). Yep, there can be poorly written clients out there. I'd expect idletimeout = to be completely unrelated, given it's long standing existence and use. ;) --Quanah -- Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: <http://www.symas.com>

1 0

Re: (ITS#8669) Slapd service becomes unresponsive intermittently
by michael＠stroeder.com 07 Jun '17

07 Jun '17

quanah(a)symas.com wrote: >> Is it possible we have the idletimeout set too high and it should be >> lowered? I=C2=B9m wondering if there is some sweet-spot value for this >> particular setting. > > I generally leave it unset unless one is encountering an issue of running > out of connections. Generally, it would be fairly strange for idletimeout > to affect things this way at all. I generally recommend to set idletimeout even somewhat tight in case you don't have a strictly defined set of clients. Because a client application which does not use its LDAP connection for ~5 min. is most times simply not closing connections. And running out of file handles can affect all file creation on your system (e.g. creating BDB's transaction log files). Only the original poster can find out with monitoring. One can find out stale connections via back-monitor in sub-tree cn=Connections,cn=Monitor. IITC attribute 'monitorConnectionActivityTime' contains last client access time on this connection. (Ummh, I have to add this to my own monitoring script...) And of course normal system monitoring of file handles would be also helpful. Ciao, Michael. (Keep repeating this mantra: monitoring, monitoring, monitoring, monitoring…)

1 0

Re: (ITS#8669) Slapd service becomes unresponsive intermittently
by quanah＠symas.com 07 Jun '17

07 Jun '17

--On Wednesday, June 07, 2017 7:07 AM -0600 Joaquin Estrada=20 <jmestrada69(a)gmail.com> wrote: > Quanah, > > We are running the Berkley DB back-end, =C2=B3back-bdb=C2=B2 in the = slapd.conf > file. > > Our server vendor did the upgrade to version 2.4.39 last year in April. = In > asking them about upgrading to a newer version, as a potential fix, I was > told the last version in the RHEL repository that they can upgrade to is > 2.4.40. I=C2=B9m not certain that our vendor will support our choice to > upgrade to a newer version than what RHEL provides them in the > repository, but if it will fix our problem, I=C2=B9ll have to push the > envelope on that matter. > > Were there any known fragmentation issues with back-bdb in the 2.4.39 > version that could also be causing these pauses? No, back-bdb is not remotely the same as back-mdb. However, I've no idea=20 what options RedHat compiles their BDB library with and there were specific = options that had an effect on OpenLDAP. Generally, I would note that the=20 back-bdb backend and back-hdb backends are deprecated at this point. > Is it possible we have the idletimeout set too high and it should be > lowered? I=C2=B9m wondering if there is some sweet-spot value for this > particular setting. I generally leave it unset unless one is encountering an issue of running=20 out of connections. Generally, it would be fairly strange for idletimeout=20 to affect things this way at all. It simply drops idle connections based=20 off of the timer. Disabling rate throttling in rsyslogd is a good idea,=20 but may be unrelated as well. We've also seen cases with RHEL7 where Redhat = has set things up so that journald also gets all the syslog messages, which = causes severe performance degredation. You could spend some time seeing if you can isolate an exact cause. For=20 example, set loglevel to 0 and see if you still encounter the issue. If=20 you do, it is unrelated to syslog activity. Another test would be to set idletimeout to 0. If you still encounter the=20 issue, it is unrelated to idle connections being dropped. etc. As Michael noted, Redhat builds are somewhat questionable as they make=20 various changes to the code base that the OpenLDAP project have not been=20 reviewed. Your issues may or may not be related to such a change, it's=20 generally impossible to know. Hope that helps. --Quanah -- Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: <http://www.symas.com>

1 0

Re: ITS#8504 fix breaks Solaris builds
by quanah＠symas.com 07 Jun '17

07 Jun '17

--On Wednesday, June 07, 2017 3:35 PM +0200 Hallvard Breien Furuseth <h.b.furuseth(a)usit.uio.no> wrote: > On 06. juni 2017 21:59, quanah(a)openldap.org wrote: >> /usr/include/signal.h:233:12: note: declared here >> extern int sigwait(sigset_t *); >> ^ > > POSIX says it has two arguments: > http://pubs.opengroup.org/onlinepubs/9699919799/functions/sigwait.html > Is there a two-argument version of sigwait in the Solaris headers > somewhere, > maybe enabled by #define POSIX_C_SOURCE 1 or something like that? It's insanely long.... /* * sigwait() prototype is defined here. */ #if defined(__EXTENSIONS__) || (!defined(_STRICT_STDC) && \ !defined(__XOPEN_OR_POSIX)) || (_POSIX_C_SOURCE - 0 >= 199506L) || \ defined(_POSIX_PTHREAD_SEMANTICS) #if defined(__STDC__) #if (_POSIX_C_SOURCE - 0 >= 199506L) || defined(_POSIX_PTHREAD_SEMANTICS) #ifdef __PRAGMA_REDEFINE_EXTNAME #pragma redefine_extname sigwait __posix_sigwait extern int sigwait(const sigset_t *_RESTRICT_KYWD, int *_RESTRICT_KYWD); #else /* __PRAGMA_REDEFINE_EXTNAME */ extern int __posix_sigwait(const sigset_t *_RESTRICT_KYWD, int *_RESTRICT_KYWD); #ifdef __lint #define sigwait __posix_sigwait #else /* !__lint */ static int sigwait(const sigset_t *_RESTRICT_KYWD __setp, int *_RESTRICT_KYWD __signo) { return (__posix_sigwait(__setp, __signo)); } #endif /* !__lint */ #endif /* __PRAGMA_REDEFINE_EXTNAME */ #else /* (_POSIX_C_SOURCE - 0 >= 199506L) || ... */ extern int sigwait(sigset_t *); #endif /* (_POSIX_C_SOURCE - 0 >= 199506L) || ... */ #else /* __STDC__ */ #if (_POSIX_C_SOURCE - 0 >= 199506L) || defined(_POSIX_PTHREAD_SEMANTICS) #ifdef __PRAGMA_REDEFINE_EXTNAME #pragma redefine_extname sigwait __posix_sigwait extern int sigwait(); #else /* __PRAGMA_REDEFINE_EXTNAME */ extern int __posix_sigwait(); #ifdef __lint #define sigwait __posix_sigwait #else /* !__lint */ static int sigwait(__setp, __signo) sigset_t *__setp; int *__signo; { return (__posix_sigwait(__setp, __signo)); } #endif /* !__lint */ #endif /* __PRAGMA_REDEFINE_EXTNAME */ #else /* (_POSIX_C_SOURCE - 0 >= 199506L) || ... */ extern int sigwait(); #endif /* (_POSIX_C_SOURCE - 0 >= 199506L) || ... */ #endif /* __STDC__ */ #endif /* defined(__EXTENSIONS__) || (!defined(_STRICT_STDC) ... */ -- Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: <http://www.symas.com>

1 0

Re: (ITS#8669) Slapd service becomes unresponsive intermittently
by jmestrada69＠gmail.com 07 Jun '17

07 Jun '17

Yes, I've reached out to our vendor about this. I am hoping we can sidestep t= he RHEL releases. Thanks for the info on this.=20 Sent from my iPhone > On Jun 7, 2017, at 6:16 AM, Michael Str=C3=B6der <michael(a)stroeder.com> wr= ote: >=20 > jmestrada69(a)gmail.com wrote: >> Our server vendor did the upgrade to version 2.4.39 last year in April. I= n >> asking them about upgrading to a newer version, as a potential fix, I was= >> told the last version in the RHEL repository that they can upgrade to is >> 2.4.40. >=20 > They seem to just recommend what seems to be the easiest choice for them a= nd not what > would be the recommended choice for *you*. RHEL packages are heavily patch= ed by Red Hat > and generally not recommended. The upstream developers cannot oversee what= 's the current > patch state of RHEL packages. >=20 > =3D> You should kick out your server vendor from doing the OpenLDAP suppor= t. >=20 > Ciao, Michael.

1 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

openldap-bugs June 2017