I'm having trouble getting the consumer synced in reasonable time. My tests were with fewer than 20 entries in the datastore and I saw no problems.
But we have 260,000 inetOrgPersons (with only a few attributes for each user: uid cn sn givenName mail userPassword).
I've set up syncrepl:
PROVIDER
# Indices to maintain for this database index objectclass,entryCSN,entryUUID eq index ou,cn,mail,surname,givenname eq,sub index uidNumber,gidNumber,loginShell eq index uid,memberUid eq,sub index nisMapName,nisMapEntry eq,sub
overlay syncprov syncprov-checkpoint 100 1 syncprov-sessionlog 100
limits dn.children="ou=replicators,dc=service,dc=utoronto,dc=ca" size=unlimited time=unlimited
(I index attributes I'm not currently using. I presume that's not the problem.)
CONSUMER
syncrepl rid=123 provider=ldap://PROVIDER:389 type=refreshAndPersist interval=00:00:10:00 retry="60 10 300 +" searchbase="dc=service,dc=utoronto,dc=ca" filter="(objectClass=*)" scope=sub schemachecking=off starttls=critical bindmethod=simple
binddn="uid=replicator,ou=replicators,dc=service,dc=utoronto,dc=ca"
I've tried with and without slapcat/slapadd to initialize the consumer. On our slower system, slapadd took 98 minutes to rebuild the database; the faster was 35 minutes (and I have only one consumer right now).
A full transfer via syncrepl is slow: 10 entries per second:
# < /var/log/daemon egrep 'bdb_add: added id=' | cut -b1-50 | uniq - c | tail -10 10 Apr 1 10:57:31 ldap2 slapd[3126]: bdb_add: added 11 Apr 1 10:57:32 ldap2 slapd[3126]: bdb_add: added 9 Apr 1 10:57:33 ldap2 slapd[3126]: bdb_add: added 9 Apr 1 10:57:34 ldap2 slapd[3126]: bdb_add: added 10 Apr 1 10:57:35 ldap2 slapd[3126]: bdb_add: added 9 Apr 1 10:57:36 ldap2 slapd[3126]: bdb_add: added 10 Apr 1 10:57:37 ldap2 slapd[3126]: bdb_add: added 10 Apr 1 10:57:38 ldap2 slapd[3126]: bdb_add: added 10 Apr 1 10:57:39 ldap2 slapd[3126]: bdb_add: added 5 Apr 1 10:57:40 ldap2 slapd[3126]: bdb_add: added
With the consumer using the slapadded initial database, syncrepl seems to be reviewing every entry, 15 entries per second:
# tail -10000 /var/log/daemon | egrep 'entry unchanged' | cut - b1-83 | uniq -c 8 Apr 1 11:30:57 ldap2 slapd[3782]: syncrepl_entry: rid=123 entry unchanged, ignored 15 Apr 1 11:30:58 ldap2 slapd[3782]: syncrepl_entry: rid=123 entry unchanged, ignored 14 Apr 1 11:30:59 ldap2 slapd[3782]: syncrepl_entry: rid=123 entry unchanged, ignored 15 Apr 1 11:31:00 ldap2 slapd[3782]: syncrepl_entry: rid=123 entry unchanged, ignored 15 Apr 1 11:31:01 ldap2 slapd[3782]: syncrepl_entry: rid=123 entry unchanged, ignored 15 Apr 1 11:31:02 ldap2 slapd[3782]: syncrepl_entry: rid=123 entry unchanged, ignored 14 Apr 1 11:31:03 ldap2 slapd[3782]: syncrepl_entry: rid=123 entry unchanged, ignored 14 Apr 1 11:31:04 ldap2 slapd[3782]: syncrepl_entry: rid=123 entry unchanged, ignored 15 Apr 1 11:31:05 ldap2 slapd[3782]: syncrepl_entry: rid=123 entry unchanged, ignored 2 Apr 1 11:31:06 ldap2 slapd[3782]: syncrepl_entry: rid=123 entry unchanged, ignored
I can only hope it'll be done in 5 hours. The datastore isn't active, so the consumer is up to date, for now, but this needless work is time consuming.
Is this normal? What happens when I restart the consumer? Why should I expect it to be faster restarting?
I changed one entry (adding displayName to my own entry) after the slapcat, so the consumer did not have the change when the consumer started 90 minutes ago. That update has not yet propagated.
Can I prime the consumer's syncrepl cookie (if that's an appropriate term)? Is that a solution? And how would I do that?
Thanks for your time,
Paul
--On Monday, April 05, 2010 2:28 PM -0230 Paul Fardy paul.fardy@utoronto.ca wrote:
I'm having trouble getting the consumer synced in reasonable time. My tests were with fewer than 20 entries in the datastore and I saw no problems.
Have you configured a DB_CONFIG file? Did you use the -q flag with slapadd? Your load times seem very abnormally long. I can load a 3 million entry LDIF file that's very large with quite a number of indices in about 2 hours using a correctly tuned system.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
My DB_CONFIG:
set_cachesize 0 268435456 1 set_lg_regionmax 262144 set_lg_bsize 2097152 set_lg_dir logs
The filesystem is ext3 on RHEL5.
-q enable quick (fewer integrity checks) mode. Does fewer consis- tency checks on the input data, and no consistency checks when writing the database. Improves the load time but if any errors or interruptions occur the resulting database will be unusable.
That last bit was enough for me not to use the -q, but it did reduce load time to 17 minutes.
The performance of slapadd is significant, but what about syncrepl? Why is the consumer reviewing every object? Reviewing "-q", I discovered
-w write syncrepl context information. After all entries are added, the contextCSN will be updated with the greatest CSN in the database.
And that looks like an option that would prime my syncrepl info. So
slapadd -q -w -l SLAPCAT.LDIF
took 14 minutes to build and then 3 minutes to close the databases. This consumer has the same hardware as the provider that took 35 minutes to rebuild the database.
That "slapadd -w" looks like the fix. Would someone confirm or reject that?
The provider's log file still shows it's reviewing many records. I guess it's not returning them. Will the log file show the DNs of results (as opposed to visited)?
I restarted the provider with less logging; logs of a full syncrepl scans are sucking up disk space. Only 5 or 6 records would have changed.
Is it normal for the provider to visit many (all?) objects even when the consumer would have a very current CSN?
Thanks for your help,
Paul
My DB_CONFIG:
set_cachesize 0 268435456 1 set_lg_regionmax 262144 set_lg_bsize 2097152 set_lg_dir logs
The filesystem is ext3 on RHEL5.
-q enable quick (fewer integrity checks) mode. Does fewer consis- tency checks on the input data, and no consistency checks when writing the database. Improves the load time but if any errors or interruptions occur the resulting database will be unusable.
That last bit was enough for me not to use the -q, but it did reduce load time to 17 minutes.
The performance of slapadd is significant, but what about syncrepl? Why is the consumer reviewing every object? Reviewing "-q", I discovered
-w write syncrepl context information. After all entries are added, the contextCSN will be updated with the greatest CSN in the database.
And that looks like an option that would prime my syncrepl info. So
slapadd -q -w -l SLAPCAT.LDIF
took 14 minutes to build and then 3 minutes to close the databases. This consumer has the same hardware as the provider that took 35 minutes to rebuild the database.
That "slapadd -w" looks like the fix. Would someone confirm or reject that?
The provider's log file still shows it's reviewing many records. I guess it's not returning them. Will the log file show the DNs of results (as opposed to visited)?
I restarted the provider with less logging; logs of a full syncrepl scans are sucking up disk space. Only 5 or 6 records would have changed.
Is it normal for the provider to visit many (all?) objects even when the consumer would have a very current CSN?
If you slapadd to the consumer the output of slapcat from the producer, the CSNs will be consistent, and no refresh will occur. Did you by chance slapadd to the consumer a fresh LDIF, with no UUID/CSN information? What -w does is simply to set the contextCSN to the latest entryCSN found in the database. If you slapcat from the producer, the suffix entry will have a valid contextCSN and -w is not needed.
p.
--On Monday, April 05, 2010 9:46 PM -0230 Paul Fardy paul.fardy@utoronto.ca wrote:
My DB_CONFIG:
set_cachesize 0 268435456 1 set_lg_regionmax 262144 set_lg_bsize 2097152 set_lg_dir logs
Are you sure this is sufficient?
That's only 256MB of cache. What is the size of du -c -h *.bdb in the database directory?
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Am Mon, 5 Apr 2010 14:28:49 -0230 schrieb Paul Fardy paul.fardy@utoronto.ca:
I'm having trouble getting the consumer synced in reasonable time. My tests were with fewer than 20 entries in the datastore and I saw no problems.
But we have 260,000 inetOrgPersons (with only a few attributes for each user: uid cn sn givenName mail userPassword).
I've set up syncrepl:
PROVIDER
# Indices to maintain for this database index objectclass,entryCSN,entryUUID eq index ou,cn,mail,surname,givenname eq,sub index uidNumber,gidNumber,loginShell eq index uid,memberUid eq,sub index nisMapName,nisMapEntry eq,sub
overlay syncprov syncprov-checkpoint 100 1 syncprov-sessionlog 100
limits dn.children="ou=replicators,dc=service,dc=utoronto,dc=ca" size=unlimited time=unlimited
(I index attributes I'm not currently using. I presume that's not the problem.)
CONSUMER
syncrepl rid=123 provider=ldap://PROVIDER:389 type=refreshAndPersist interval=00:00:10:00 retry="60 10 300 +" searchbase="dc=service,dc=utoronto,dc=ca" filter="(objectClass=*)" scope=sub schemachecking=off starttls=critical bindmethod=simple
binddn="uid=replicator,ou=replicators,dc=service,dc=utoronto,dc=ca"
I've tried with and without slapcat/slapadd to initialize the consumer. On our slower system, slapadd took 98 minutes to rebuild the database; the faster was 35 minutes (and I have only one consumer right now).
A full transfer via syncrepl is slow: 10 entries per second:
[...]
What filesystem are you running?
-Dieter
openldap-software@openldap.org