Hello list,
i hope you can help me with my problem.
To my setup:
All servers are OpenLDAP 2.4.42
I have an master LDAP server, which replicates with standard syncrepl to an consumer ldap. On this consumer ldap server i have configured an standalone slapd proxy ldap with slapd-ldap which pushes changes to more than 6000 consumer ldaps.
There are more ldap proxys running, with each 500 consumers to reduce startup time.
The master and slave are connected via TCP, and the ldap proxys are on the slave via socket.
Everything works fine and changes are replicated in realtime to the consumers behind the proxy, but after some time ( about 20 to 30 minutes ) the slave ldap just hangs and isnt responding anymore. A short time before it hangs the changes are pushed with an long delay, before it hangs fully. With debug on ( -d256 ) everything looks fine and no error is displayed, but it hangs.
I have tested the standard syncrepl and delta syncrepl with the same result. When strace the process there are only many futex_wait() While i write this mail the error doesnt occur, so i am not able to paste an strace.
So then. Has anyone an idea to this problem or an better solution for my setup ? Any hints to debug this, or some tips and tricks would be really nice.
Here are the relevant configuration settings of all servers:
## all ldap servers are started with extended limits in systemd LimitCORE=0 LimitNPROC=5000000 LimitNOFILE=65535 LimitSTACK=81920 LimitDATA=infinity LimitMEMLOCK=infinity LimitRSS=infinity LimitAS=infinity
and: echo 5000000 > /proc/sys/kernel/threads-max
Cause limits in openldap itself i have patched it too:
diff -rNu openldap-2.4.42.orig/libraries/libldap_r/tpool.c openldap-2.4.42/libraries/libldap_r/tpool.c --- openldap-2.4.42.orig/libraries/libldap_r/tpool.c 2015-08-31 08:26:55.000000000 +0200 +++ openldap-2.4.42/libraries/libldap_r/tpool.c 2015-08-31 07:39:25.000000000 +0200 @@ -42,10 +42,10 @@ /* Max number of thread-specific keys we store per thread. * We don't expect to use many... */ -#define MAXKEYS 32 +#define MAXKEYS 65535
/* Max number of threads */ -#define LDAP_MAXTHR 1024 /* must be a power of 2 */ +#define LDAP_MAXTHR 65535 /* must be a power of 2 */
/* (Theoretical) max number of pending requests */ #define MAX_PENDING (INT_MAX/2) /* INT_MAX - (room to avoid overflow) */ diff -rNu openldap-2.4.42.orig/servers/slapd/daemon.c openldap-2.4.42/servers/slapd/daemon.c --- openldap-2.4.42.orig/servers/slapd/daemon.c 2015-08-31 08:25:42.000000000 +0200 +++ openldap-2.4.42/servers/slapd/daemon.c 2015-08-31 07:42:02.000000000 +0200 @@ -1635,6 +1635,7 @@ #else /* ! HAVE_SYSCONF && ! HAVE_GETDTABLESIZE */ dtblsize = FD_SETSIZE; #endif /* ! HAVE_SYSCONF && ! HAVE_GETDTABLESIZE */ + dtblsize=8192;
/* open a pipe (or something equivalent connected to itself). * we write a byte on this fd whenever we catch a signal. The main
And raised the max integer numbers of syncrepl´s rid=
### Master ###########
loglevel 0 sizelimit unlimited
database mdb suffix "o=company, c=de" rootdn "cn=Manager,o=company,c=de" rootpw "xxxxxxxxxxxxxxxxxxxxxxxx"
overlay syncprov syncprov-checkpoint 100 10 syncprov-sessionlog 1000
index DFan,DFname,uid,uidNumber,gidNumber,DFCronjobID eq index entryUUID,entryCSN eq index objectClass eq directory /var/lib/ldap/openldap-mdb maxsize 8500000000
#### Slave ##################
loglevel 0 threads 2048
database mdb suffix "o=company, c=de" rootdn "cn=Manager,o=company,c=de" rootpw "xxxxxxxxxxxxxxxxxxxx"
# here are all consumer ldap servers one by one
access to dn.subtree="sid=240,sec=webhosting,o=company,c=de" by dn.exact="cn=replicator,sid=240,sec=webhosting,o=company,c=de" write by * auth
access to dn.subtree="sid=241,sec=webhosting,o=company,c=de" by dn.exact="cn=replicator,sid=241,sec=webhosting,o=company,c=de" write by * auth
access to dn.subtree="sid=242,sec=webhosting,o=company,c=de" by dn.exact="cn=replicator,sid=242,sec=webhosting,o=company,c=de" write by * auth
... ...
index DFan,DFname,uid,uidNumber,gidNumber,DFCronjobID eq index entryUUID,entryCSN eq index objectClass eq directory /var/lib/ldap/openldap-mdb
syncrepl rid=001 provider=ldaps://ldapmaster:636/ binddn="cn=Manager,o=company,c=de" bindmethod=simple credentials=xxxxxxxxxxxxxxxxxxxxxx searchbase="o=company,c=de" type=refreshAndPersist retry="5 5 300 5"
overlay syncprov syncprov-checkpoint 1000 60
maxsize 8500000000 maxreaders 12000
##### SLAPD Proxy #####################
database ldap hidden on suffix "sid=240,sec=webhosting,o=company,c=de" rootdn "cn=replicator,sid=240,sec=webhosting,o=company,c=de" uri ldaps://sid240.int.webslave.company.de:636 lastmod on restrict all
acl-bind bindmethod=simple binddn="cn=replicator,sid=240,sec=webhosting,o=company,c=de" credentials="xxxxxxxxxxxxxxxxxxxxx"
syncrepl rid=001 provider=ldapi:// binddn="cn=Manager,o=company,c=de" bindmethod=simple credentials=xxxxxxxxxxxxxxxxxxxxxxxxx searchbase="sid=240,sec=webhosting,o=company,c=de" type=refreshAndPersist retry="5 5 300 5"
overlay syncprov
# next one database ldap hidden on suffix "sid=241,sec=webhosting,o=company,c=de" rootdn "cn=replicator,sid=241,sec=webhosting,o=company,c=de" uri ldaps://sid241.int.webslave.company.de:636 lastmod on restrict all
acl-bind bindmethod=simple binddn="cn=replicator,sid=241,sec=webhosting,o=company,c=de" credentials="xxxxxxxxxxxxxxxxxxxxx"
syncrepl rid=001 provider=ldapi:// binddn="cn=Manager,o=company,c=de" bindmethod=simple credentials=xxxxxxxxxxxxxxxxxxxxxxxxx searchbase="sid=241,sec=webhosting,o=company,c=de" type=refreshAndPersist retry="5 5 300 5"
overlay syncprov ...
#### and the 6300 consumers on the end ###############
database mdb suffix "sid=240,sec=webhosting,o=company,c=de" rootdn "cn=replicator,sid=240,sec=webhosting,o=company,c=de" rootpw {SSHA}xxxxxxxxxxxxxxxx index DFan,DFname,DFdnumber,sid,uid,uidNumber,gidNumber,DFCronjobID eq index objectClass eq index entryUUID,entryCSN eq directory /var/lib/ldap/openldap-mdb/sid240 updatedn "cn=replicator,sid=240,sec=webhosting,o=company,c=de" maxsize 1073741824 subordinate
updateref ldaps://ldapmaster:636
database mdb suffix "o=company,c=de" rootdn "cn=Manager,o=company,c=de" rootpw {SSHA}xxxxxxxxxxxxxxxx index objectClass eq directory /var/lib/ldap/openldap-mdb/rest
Regards,
Daniel Betz System Design Engineer / Senior Systemadministration ___________________________________
domainfactory GmbH Oskar-Messter-Str. 33 85737 Ismaning Germany
Telefon: +49 (0)89 / 55266-364 Telefax: +49 (0)89 / 55266-222
E-Mail: dbetz@df.eu Internet: www.df.eu
Registergericht: Amtsgericht München HRB-Nummer 150294, Geschäftsführer: Tobias Mohr, Stephan Wolfram
On Thu, Aug 25, 2016 at 12:23:53PM +0000, Daniel Betz wrote:
Hello list,
i hope you can help me with my problem.
To my setup:
All servers are OpenLDAP 2.4.42
I have an master LDAP server, which replicates with standard syncrepl to an consumer ldap. On this consumer ldap server i have configured an standalone slapd proxy ldap with slapd-ldap which pushes changes to more than 6000 consumer ldaps.
There are more ldap proxys running, with each 500 consumers to reduce startup time.
The master and slave are connected via TCP, and the ldap proxys are on the slave via socket.
Everything works fine and changes are replicated in realtime to the consumers behind the proxy, but after some time ( about 20 to 30 minutes ) the slave ldap just hangs and isnt responding anymore.
I'm not going to claim I had the same problem as you, but we had Java code that messed up a connection pool, and from the view of our OpenLDAP server, via strace we saw the process spinning on a wait on a file handle, and that file handle process to be one controlled by that Java code.
Until we cleaned up the Java code, our workaround was to introduce settings like this in our slapd.conf file:
idletimeout 30 writetimeout 60
Hi all,
damn i found the problem ... an earlier yum update has overwritten my self compiled slapd :(
Now everything is working as expected.
Daniel
-----Ursprüngliche Nachricht----- Von: Brian Reichert [mailto:reichert@numachi.com] Gesendet: Freitag, 26. August 2016 18:18 An: Daniel Betz dbetz@df.eu Cc: openldap-technical@openldap.org Betreff: Re: openldap stops responding after some time
On Thu, Aug 25, 2016 at 12:23:53PM +0000, Daniel Betz wrote:
Hello list,
i hope you can help me with my problem.
To my setup:
All servers are OpenLDAP 2.4.42
I have an master LDAP server, which replicates with standard syncrepl to an
consumer ldap.
On this consumer ldap server i have configured an standalone slapd proxy
ldap with slapd-ldap which pushes changes to more than 6000 consumer ldaps.
There are more ldap proxys running, with each 500 consumers to reduce
startup time.
The master and slave are connected via TCP, and the ldap proxys are on the
slave via socket.
Everything works fine and changes are replicated in realtime to the
consumers behind the proxy, but after some time ( about 20 to 30 minutes ) the slave ldap just hangs and isnt responding anymore.
I'm not going to claim I had the same problem as you, but we had Java code that messed up a connection pool, and from the view of our OpenLDAP server, via strace we saw the process spinning on a wait on a file handle, and that file handle process to be one controlled by that Java code.
Until we cleaned up the Java code, our workaround was to introduce settings like this in our slapd.conf file:
idletimeout 30 writetimeout 60
-- Brian Reichert reichert@numachi.com BSD admin/developer at large
openldap-technical@openldap.org