OpenLDAP version(s): 2.4.32 and recent git master (54a111907b48900e1ce5de12a9d87818a334423d) OS: CentOS 5, 32-bit
In a master/slave delta-syncrepl configuration, syncrepl connections from the fully synchronized and restarted slave to the freshly loaded master will fail because the master uses an earlier contextCSN. This state persists until the first modification on the master, when the contextCSN will become newer than the consumer's, whose next syncrepl retry will then succeed.
Details (master and slave configurations are attached to this message):
The master is freshly loaded with the following LDIF:
dn: dc=example,dc=com objectClass: locality objectClass: dcObject dc: example
dn: ou=People,dc=example,dc=com objectClass: organizationalUnit ou: People
When the master is started, there is only one entry in cn=accesslog:
dn: cn=accesslog objectClass: auditContainer cn: accesslog structuralObjectClass: auditContainer entryDN: cn=accesslog subschemaSubentry: cn=Subschema contextCSN: 20120910091205.088888Z#000000#000#000000 hasSubordinates: FALSE
The root entry of the main database looks like this:
dn: dc=example,dc=com objectClass: locality objectClass: dcObject dc: example structuralObjectClass: locality entryUUID: 921f3734-24fc-4efa-ac52-10aa4d293332 creatorsName: cn=Manager,dc=example,dc=com createTimestamp: 20120910090757Z entryCSN: 20120910090757.483217Z#000000#000#000000 modifiersName: cn=Manager,dc=example,dc=com modifyTimestamp: 20120910090757Z entryDN: dc=example,dc=com subschemaSubentry: cn=Subschema auditContext: cn=accesslog contextCSN: 20120910091205.089003Z#000000#000#000000 hasSubordinates: TRUE
When the slave server (with an empty database) is started, it connects to the master and synchronizes. Their synchronization dialog, captured with -d sync, looks like this:
(master) 504db377 syncprov_search_response: cookie=rid=000,csn=20120910091205.089003Z#000000#000#000000 504db377 syncprov_search_response: cookie=rid=000,csn=20120910091205.088888Z#000000#000#000000
(slave) 504db370 syncrepl_message_to_entry: rid=000 DN: dc=example,dc=com, UUID: 921f3734-24fc-4efa-ac52-10aa4d293332 504db370 syncrepl_entry: rid=000 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD) 504db370 syncrepl_entry: rid=000 inserted UUID 921f3734-24fc-4efa-ac52-10aa4d293332 504db370 syncrepl_entry: rid=000 be_search (32) 504db370 syncrepl_entry: rid=000 dc=example,dc=com 504db370 syncrepl_entry: rid=000 be_add dc=example,dc=com (0) 504db370 syncrepl_message_to_entry: rid=000 DN: ou=People,dc=example,dc=com, UUID: 08fe53fb-724a-4eca-a57e-3c271d99dd7e 504db370 syncrepl_entry: rid=000 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD) 504db370 syncrepl_entry: rid=000 inserted UUID 08fe53fb-724a-4eca-a57e-3c271d99dd7e 504db370 syncrepl_entry: rid=000 be_search (0) 504db370 syncrepl_entry: rid=000 ou=People,dc=example,dc=com 504db370 syncrepl_entry: rid=000 be_add ou=People,dc=example,dc=com (0) 504db370 do_syncrep2: rid=000 LDAP_RES_SEARCH_RESULT 504db370 do_syncrep2: rid=000 cookie=rid=000,csn=20120910091205.089003Z#000000#000#000000 504db370 slap_queue_csn: queing 0x836dcf8 20120910091205.089003Z#000000#000#000000 504db370 slap_graduate_commit_csn: removing 0x8371328 20120910091205.089003Z#000000#000#000000 504db370 do_syncrep2: rid=000 LDAP_RES_INTERMEDIATE - REFRESH_DELETE 504db370 do_syncrep2: rid=000 cookie=rid=000,csn=20120910091205.088888Z#000000#000#000000
The master sends the cookie with the contextCSN belonging to cn=accesslog. If the slave is now restarted, what happens is this:
504db38d do_syncrep2: rid=000 LDAP_RES_SEARCH_RESULT 504db38d do_syncrep2: rid=000 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform 504db38d do_syncrep2: rid=000 (53) Server is unwilling to perform 504db38d do_syncrepl: rid=000 rc -2 retrying
The slave sends csn=20120910091205.089003Z#000000#000#000000, but the master uses csn=20120910091205.088888Z#000000#000#000000. When the master is subsequently modified, it will use a newer contextCSN, and the slave's next retry will succeed.
Is this worthy of an ITS, since the situation is unlikely to occur in practice, and will correct itself, only leaving the slave out of sync for the duration of the retry interval?
i.
include ../servers/slapd/schema/core.schema
argsfile master/run/slapd.args pidfile master/run/slapd.pid
database mdb suffix cn=accesslog directory master/accesslog index entryCSN,objectClass,reqEnd,reqResult,reqStart eq
overlay syncprov syncprov-nopresent TRUE syncprov-reloadhint TRUE
database mdb suffix "dc=example,dc=com" rootdn "cn=Manager,dc=example,dc=com" rootpw secret maxsize 256000000 directory master/main index objectClass eq index entryCSN,entryUUID eq
overlay syncprov syncprov-checkpoint 1000 60
overlay accesslog logdb cn=accesslog logops writes logsuccess TRUE logpurge 07+00:00 01+00:00
include ../servers/slapd/schema/core.schema
argsfile replica/run/slapd.args pidfile replica/run/slapd.pid
database mdb suffix "dc=example,dc=com" rootdn "cn=Manager,dc=example,dc=com" rootpw secret maxsize 256000000 directory replica/main index objectClass eq index entryUUID eq syncrepl rid=0 provider=ldap://localhost:6389/ bindmethod=simple binddn="cn=Manager,dc=example,dc=com" credentials=secret searchbase="dc=example,dc=com" logbase="cn=accesslog" logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" schemachecking=on type=refreshAndPersist retry="60 +" syncdata=accesslog updateref "ldap://localhost:6389"