Hi!
I have a 4 cpu machine with SLES 9 and OpenLDAP 2.3.35. This is master to about 65 slaves via syncreplication.
These slave were set up subsequently without any problem. In the last time every now and then the master cannot answer searches in time. Clients time out - i.e. Postfix doing address verification. This may have to do with writes to the master and the replication. The worst time was, when I had to reinitialize about 15 to 20 slaves and these slave checked the master to get in sync.* Other clients got no answer for a long long time. :(
The iron is idle. Load average is 0.00. There were limiting factors like file number etc, which limited process numbers, but this was solved in the mean time.
Which number of connections to the master does a replication slave open?
How can I find out, what is limiting slapd and keep him from responding in time? Log files does not give me a hint, but I don't really know what to search for.
* I take a "top level" ldif file to start the server up - ou=bla,o=foo, ou=log,ou=bla,o=foo and ou=humans,ou=bla,o=foo - and the latest hourly branch dump (these slave only replicate one branch from the master server) - ou=humans,ou=bla,o=foo - from the master to fill the server. In this case the slave contains all the data, but checks ALL entries to be "up to date". Is there a way to prevent this? The slave does only have to check for changes that happened in the last hour (since the dump).
Hans
Hans Moser wrote:
I have a 4 cpu machine with SLES 9 and OpenLDAP 2.3.35. This is master to about 65 slaves via syncreplication.
These slave were set up subsequently without any problem. In the last time every now and then the master cannot answer searches in time. Clients time out - i.e. Postfix doing address verification. This may have to do with writes to the master and the replication. The worst time was, when I had to reinitialize about 15 to 20 slaves and these slave checked the master to get in sync.* Other clients got no answer for a long long time. :(
The iron is idle. Load average is 0.00. There were limiting factors like file number etc, which limited process numbers, but this was solved in the mean time.
Which number of connections to the master does a replication slave open?
How can I find out, what is limiting slapd and keep him from responding in time? Log files does not give me a hint, but I don't really know what to search for.
- I take a "top level" ldif file to start the server up - ou=bla,o=foo,
ou=log,ou=bla,o=foo and ou=humans,ou=bla,o=foo - and the latest hourly branch dump (these slave only replicate one branch from the master server) - ou=humans,ou=bla,o=foo - from the master to fill the server. In this case the slave contains all the data, but checks ALL entries to be "up to date". Is there a way to prevent this? The slave does only have to check for changes that happened in the last hour (since the dump).
There is no connection number limit (except for OS limits on number of file descriptors and so); bu 65 replicas re-syncing simultaneously, with operations that may require hours, will eat up all threads if configured as the default. If you need to have so many replicas, you might consider unloading the master from bulk search load, dedicating it to centralizing writes, and configure it with lots of threads, so that it can simultaneously deal with syncs and writes (e.g. 8 threads plus the number of consumers, to be conservative).
p.
Ing. Pierangelo Masarati OpenLDAP Core Team
SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it --------------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Email: pierangelo.masarati@sys-net.it ---------------------------------------
Hi,
Hans Moser hans.moser@ofd-sth.niedersachsen.de writes:
R> Hi!
I have a 4 cpu machine with SLES 9 and OpenLDAP 2.3.35. This is master to about 65 slaves via syncreplication.
These slave were set up subsequently without any problem. In the last time every now and then the master cannot answer searches in time. Clients time out - i.e. Postfix doing address verification. This may have to do with writes to the master and the replication. The worst time was, when I had to reinitialize about 15 to 20 slaves and these slave checked the master to get in sync.* Other clients got no answer for a long long time. :(
The iron is idle. Load average is 0.00. There were limiting factors like file number etc, which limited process numbers, but this was solved in the mean time.
Which number of connections to the master does a replication slave open?
connenctions or threads? It should be only one connection.
How can I find out, what is limiting slapd and keep him from responding in time? Log files does not give me a hint, but I don't really know what to search for.
strace? valgrind?
- I take a "top level" ldif file to start the server up -
ou=bla,o=foo, ou=log,ou=bla,o=foo and ou=humans,ou=bla,o=foo - and the latest hourly branch dump (these slave only replicate one branch from the master server) - ou=humans,ou=bla,o=foo - from the master to fill the server. In this case the slave contains all the data, but checks ALL entries to be "up to date". Is there a way to prevent this? The slave does only have to check for changes that happened in the last hour (since the dump).
Did you setup delta replication?
-Dieter
Hans Moser wrote:
I have a 4 cpu machine with SLES 9 and OpenLDAP 2.3.35.
If you can, you want to upgrade your OpenLDAP version to at least 2.3.39, otherwise if an object class is mis-spelled your LDAP server can be crashed by a double-free bug. This means anyone anywhere in the world who can get a query executed on your LDAP server can cause it to crash. You're wide open.
You also don't tell us what hardware configuration you're running, how much RAM and what type of CPUs you have (is this a quad-core single CPU machine, or a machine with four real single-core processors, etc...), how your OpenLDAP program is compiled and how many threads you're trying to run, and you don't tell us anything about any performance tuning you may have already done.
Did you follow the instructions in section 19.4 at http://www.openldap.org/doc/admin24/tuning.html#Caching?
I'm going through a somewhat similar process as I try to ramp up here on our OpenLDAP systems, and these are all the sorts of things that I've been asked for when I was asking for information on how to debug the problems we've seen. Our issues have been primarily one of stability first, and performance second. But we do care about performance.
--On Thursday, February 21, 2008 12:27 PM -0600 Brad Knowles b.knowles@its.utexas.edu wrote:
Hans Moser wrote:
I have a 4 cpu machine with SLES 9 and OpenLDAP 2.3.35.
If you can, you want to upgrade your OpenLDAP version to at least 2.3.39, otherwise if an object class is mis-spelled your LDAP server can be crashed by a double-free bug. This means anyone anywhere in the world who can get a query executed on your LDAP server can cause it to crash. You're wide open.
You may as well go to 2.3.41 if you are concerned about security issues, since anyone issuing a modrdn can crash your 2.3 server if it is less than 2.3.41. ;)
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
--On Thursday, February 21, 2008 3:19 PM +0100 Hans Moser hans.moser@ofd-sth.niedersachsen.de wrote:
Hi!
I have a 4 cpu machine with SLES 9 and OpenLDAP 2.3.35. This is master to about 65 slaves via syncreplication.
Why do you have 65 slaves? I've yet to really see a need for more than 3-4 slaves unless one has world-wide distributed offices or the like. I would note the following:
(a) there have been a number of significant fixes to sync replication since 2.3.35 (b) syncrepl is highly intensive. delta-syncrepl is not so intensive for ongoing writes (c) Using a full refresh instead of doing a slapcat of the master/slapadd of the slave (or slapcat of an existing slave to add to a slave) is highly intensive (d) You make no mention of any tuning you may (or may not) have done, via the DB_CONFIG file for BDB, or the cachesize, idlcachesize, threads, or indexing for slapd, all of which could directly impact the performance of the master. (e) I generally advise a setup where the master is isolated to only taking writes(*), while letting replicas handle the reads.
(*) - Things that need a guaranteed response should use the master.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
openldap-software@openldap.org