Hello list,
openldap-2.3.41 db-4.2.52.NC-PLUS_5_PATCHES SunOS ldapmaster01.unix 5.10 Generic_127128-11 i86pc i386 i86pc
We currently have 1 master, and about 25 clients hanging off it, using syncrepl. Today we restarted the master for the first time in quite some time. This was to add an index we had forgotten. It was only added to the master.
Initially, the master replies very fast to test-ldapsearch.
But it appears that all 25 clients connect within the first 30seconds or so, and start the syncing process. This appears to take about 30 minutes of communicating back and forth. (As observed with snoop/tcpdump).
Simple commandline ldapsearch connect, but never replies. I haven't even started the software that talks to ldapmaster, so it is essentially doing nothing. (Just checking everything is in sync, there should be no changes).
This seems rather aggressive. I assume my syncrepl is set far too eagerly. Normally, syncrepl works beautifully, and updates are very fast across the board. But having hour long no-response from the master after a restart is undesirable.
Can someone suggest better values for our syncrepl?
Master has:
lastmod on checkpoint 128 15 cachesize 10000 overlay syncprov syncprov-checkpoint 100 10 syncprov-sessionlog 100
Slaves has: (RID is based on IP's last octet + 256)
lastmod on checkpoint 128 15 cachesize 10000 syncrepl rid=279 provider=ldap://172.20.12.113 type=refreshAndPersist interval=00:00:00:30 searchbase="dc=company,dc=com" filter="(objectClass=*)" attrs="*" scope=sub schemachecking=off updatedn="cn=admin,dc=company,dc=com" bindmethod=simple binddn="cn=admin,dc=company,dc=com" credentials="OurSecret" retry="60 10 300 +"
# wait 60s then retry connect 10 times, then wait 300s forever updateref ldap://172.20.12.113
Hello list,
openldap-2.3.41 db-4.2.52.NC-PLUS_5_PATCHES SunOS ldapmaster01.unix 5.10 Generic_127128-11 i86pc i386 i86pc
We currently have 1 master, and about 25 clients hanging off it, using syncrepl. Today we restarted the master for the first time in quite some time. This was to add an index we had forgotten. It was only added to the master.
Initially, the master replies very fast to test-ldapsearch.
But it appears that all 25 clients connect within the first 30seconds or so, and start the syncing process. This appears to take about 30 minutes of communicating back and forth. (As observed with snoop/tcpdump).
Simple commandline ldapsearch connect, but never replies. I haven't even started the software that talks to ldapmaster, so it is essentially doing nothing. (Just checking everything is in sync, there should be no changes).
This seems rather aggressive. I assume my syncrepl is set far too eagerly. Normally, syncrepl works beautifully, and updates are very fast across the board. But having hour long no-response from the master after a restart is undesirable.
Can someone suggest better values for our syncrepl?
Master has:
lastmod on checkpoint 128 15 cachesize 10000 overlay syncprov syncprov-checkpoint 100 10 syncprov-sessionlog 100
Slaves has: (RID is based on IP's last octet + 256)
lastmod on checkpoint 128 15 cachesize 10000 syncrepl rid=279 provider=ldap://172.20.12.113 type=refreshAndPersist interval=00:00:00:30 searchbase="dc=company,dc=com" filter="(objectClass=*)" attrs="*" scope=sub schemachecking=off updatedn="cn=admin,dc=company,dc=com" bindmethod=simple binddn="cn=admin,dc=company,dc=com" credentials="OurSecret" retry="60 10 300 +"
# wait 60s then retry connect 10 times, then wait 300s forever updateref ldap://172.20.12.113
25 consumers doing a full refresh probably ate up all threads available on the producer. You should either cascade your consumers (build a replication chain where a layer of consumers acts as producers for the remaining), or increase the number of threads on the producer.
p.
--On Tuesday, March 02, 2010 7:58 PM +0100 masarati@aero.polimi.it wrote:
25 consumers doing a full refresh probably ate up all threads available on the producer. You should either cascade your consumers (build a replication chain where a layer of consumers acts as producers for the remaining), or increase the number of threads on the producer.
Using delta-syncrepl can also help reduce such a load.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Quanah Gibson-Mount wrote:
--On Tuesday, March 02, 2010 7:58 PM +0100 masarati@aero.polimi.it wrote:
25 consumers doing a full refresh probably ate up all threads available on the producer. You should either cascade your consumers (build a replication chain where a layer of consumers acts as producers for the remaining), or increase the number of threads on the producer.
Using delta-syncrepl can also help reduce such a load.
So if I understand it completely:
* I could increase Threads on ldapmaster, from the default (16?) to say 32.
It is a 4 core server, dedicated to slapd. Most documentation does seem to discourage this though. However, doing something like rsync will generally only yield 500KB/s to 1MB/s of speed. Most likely due to disk IO. It appears that it had 25 or so servers trying to do a complete sync/consistency-check. So perhaps there just isn't more to get out of it while the setup is this way.
* Make master sync only to slave01 and slave02, and they in turn, sync to everyone else. Is this the recommended setup?
Change the other 23 servers to sync with slave01 and slave02 instead. This way master only has a few servers to sync with, and each slave has "its share of the workload". But can I specify more than one "provider" in syncrepl command as a fail-over?
provider=ldap://ldapslave01,ldap://ldapslave02 ?
But I would guess you can not. (it only has one master right now anyway, just curious)
* Investigate delta-sync
Unknown to me, will need to research if our current ldap software version can support it. Perhaps try it on the test-servers.
* Logical tree split
I guess potentially I could run multiple masters, and have separate trees (one for mail, one for www, one for dns etc) but it would be nice not to have to do that.
For the time being I stopped syncrepl on all but 10 servers, so that we could have a read/write ldapmaster while it sorted itself out. It appears those 10 servers needed about 14 hours to sync. I have re-added the remaining servers to sync now and we should be back to stable in 14 hours or so.
--On Wednesday, March 03, 2010 9:05 AM +0900 Jorgen Lundman lundman@lundman.net wrote:
- Investigate delta-sync
Unknown to me, will need to research if our current ldap software version can support it. Perhaps try it on the test-servers.
Quite frankly, in the OpenLDAP 2.3 branch, this is the only viable replication option IMHO.
Syncrepl itself was reworked for 2.4 to be much more lightweight. But delta-sync will alleviate your 14+ hour issues on stopping/restarting the master.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
openldap-software@openldap.org