syncrepl: large datasets and expediting consumer's initialization - openldap-software

5 Apr 2010


      I'm having trouble getting the consumer synced in reasonable time. My  
tests were with fewer than 20 entries in the datastore and I saw no  
problems.
But we have 260,000 inetOrgPersons (with only a few attributes for  
each user: uid cn sn givenName mail userPassword).
I've set up syncrepl:
PROVIDER
...
# Indices to maintain for this database
index objectclass,entryCSN,entryUUID eq
index ou,cn,mail,surname,givenname      eq,sub
index uidNumber,gidNumber,loginShell    eq
index uid,memberUid                     eq,sub
index nisMapName,nisMapEntry            eq,sub
overlay syncprov
syncprov-checkpoint 100 1
syncprov-sessionlog 100
limits dn.children="ou=replicators,dc=service,dc=utoronto,dc=ca"  
size=unlimited time=unlimited
(I index attributes I'm not currently using. I presume that's not the  
problem.)
CONSUMER
...
syncrepl rid=123
    provider=ldap://PROVIDER:389
    type=refreshAndPersist
    interval=00:00:10:00
    retry="60 10 300 +"
    searchbase="dc=service,dc=utoronto,dc=ca"
    filter="(objectClass=*)"
    scope=sub
    schemachecking=off
    starttls=critical
    bindmethod=simple
binddn="uid=replicator,ou=replicators,dc=service,dc=utoronto,dc=ca"
I've tried with and without slapcat/slapadd to initialize the  
consumer. On our slower system, slapadd took 98 minutes to rebuild the  
database; the faster was 35 minutes (and I have only one consumer  
right now).
A full transfer via syncrepl is slow: 10 entries per second:
...
# < /var/log/daemon  egrep 'bdb_add: added id=' | cut -b1-50 | uniq - 
c | tail -10
    10 Apr  1 10:57:31 ldap2 slapd[3126]: bdb_add: added
    11 Apr  1 10:57:32 ldap2 slapd[3126]: bdb_add: added
     9 Apr  1 10:57:33 ldap2 slapd[3126]: bdb_add: added
     9 Apr  1 10:57:34 ldap2 slapd[3126]: bdb_add: added
    10 Apr  1 10:57:35 ldap2 slapd[3126]: bdb_add: added
     9 Apr  1 10:57:36 ldap2 slapd[3126]: bdb_add: added
    10 Apr  1 10:57:37 ldap2 slapd[3126]: bdb_add: added
    10 Apr  1 10:57:38 ldap2 slapd[3126]: bdb_add: added
    10 Apr  1 10:57:39 ldap2 slapd[3126]: bdb_add: added
     5 Apr  1 10:57:40 ldap2 slapd[3126]: bdb_add: added
With the consumer using the slapadded initial database, syncrepl seems  
to be reviewing every entry, 15 entries per second:
...
# tail -10000 /var/log/daemon  | egrep 'entry unchanged' | cut - 
b1-83 | uniq -c
     8 Apr  1 11:30:57 ldap2 slapd[3782]: syncrepl_entry: rid=123  
entry unchanged, ignored
    15 Apr  1 11:30:58 ldap2 slapd[3782]: syncrepl_entry: rid=123  
entry unchanged, ignored
    14 Apr  1 11:30:59 ldap2 slapd[3782]: syncrepl_entry: rid=123  
entry unchanged, ignored
    15 Apr  1 11:31:00 ldap2 slapd[3782]: syncrepl_entry: rid=123  
entry unchanged, ignored
    15 Apr  1 11:31:01 ldap2 slapd[3782]: syncrepl_entry: rid=123  
entry unchanged, ignored
    15 Apr  1 11:31:02 ldap2 slapd[3782]: syncrepl_entry: rid=123  
entry unchanged, ignored
    14 Apr  1 11:31:03 ldap2 slapd[3782]: syncrepl_entry: rid=123  
entry unchanged, ignored
    14 Apr  1 11:31:04 ldap2 slapd[3782]: syncrepl_entry: rid=123  
entry unchanged, ignored
    15 Apr  1 11:31:05 ldap2 slapd[3782]: syncrepl_entry: rid=123  
entry unchanged, ignored
     2 Apr  1 11:31:06 ldap2 slapd[3782]: syncrepl_entry: rid=123  
entry unchanged, ignored
I can only hope it'll be done in 5 hours. The datastore isn't active,  
so the consumer is up to date, for now, but this needless work is time  
consuming.
Is this normal?
What happens when I restart the consumer? Why should I expect it to be  
faster restarting?
I changed one entry (adding displayName to my own entry) after the  
slapcat, so the consumer did not have the change when the consumer  
started 90 minutes ago. That update has not yet propagated.
Can I prime the consumer's syncrepl cookie (if that's an appropriate  
term)? Is that a solution? And how would I do that?
Thanks for your time,
Paul