We are currently using the following:
OpenLDAP 2.3.43 standard syncprov overlay configuration below 1 master and 3 replicas each on 15k RAID 10.
Problem is batch modifications (perl ldapmod) consisting of modifying a large number of entries ( > 50k ) will cause major replication delays after a few hours. The entry updates tend to be small eg adding a new non-indexed attribute and we throttle down the update to only 3 entries per second.
What is the recommended way to batch modify large amounts of entries without taking the directory (master or replica) offline. Furthermore what is the expected replication throughput eg mods/sec that will sustain no replication delay.
Any and all advice greatly appreciated.
Thank you,
Mark
MASTER replication config ==
overlay syncprov syncprov-checkpoint 50 10 syncprov-sessionlog 100
REPLICA replication config ==
syncrepl rid=RID provider=MASTER_URI type=refreshAndPersist searchbase=SEARCH_BASE binddn=BINDDN credentials=BINDPASSWORD retry="30 +"
Problem is batch modifications (perl ldapmod) consisting of modifying a large number of entries ( > 50k ) will cause major replication delays after a few hours. The entry updates tend to be small eg adding a new non-indexed attribute and we throttle down the update to only 3 entries per second.
What is the recommended way to batch modify large amounts of entries without taking the directory (master or replica) offline. Furthermore what is the expected replication throughput eg mods/sec that will sustain no replication delay.
This issue appears to be covered in the SyncRpl section of the openldap site. Here is an extract that sounds similar to your situation
"LDAP Sync replication is an object-based replication mechanism. When any attribute value in a replicated object is changed on the provider, each consumer fetches and processes the complete changed object, including both the changed and unchanged attribute values during replication. One advantage of this approach is that when multiple changes occur to a single object, the precise sequence of those changes need not be preserved; only the final state of the entry is significant. But this approach may have drawbacks when the usage pattern involves single changes to multiple objects.
For example, suppose you have a database consisting of 100,000 objects of 1 KB each. Further, suppose you routinely run a batch job to change the value of a single two-byte attribute value that appears in each of the 100,000 objects on the master. Not counting LDAP and TCP/IP protocol overhead, each time you run this job each consumer will transfer and process 1 GB of data to process 200KB of changes!"
The use of delta sync may alleviate this problem you are seeing. Please see this page http://www.openldap.org/doc/admin24/replication.html
I believe there are also "rate limits" on access to an OpenLDAP tree, so maybe you should set your sync user to not have such limits? I don't know what this setting is but Some research may point you in the correct direction (or of course, one of the other members of this mailing list). Network speed may also be a potential bottleneck in this case, but that is much less likely than anything else.
William
openldap-technical@openldap.org