I'm upgrading a site from OpenLDAP 2.3.42 to 2.4.12 in an attempt to alleviate http://www.openldap.org/its/index.cgi/Incoming?id=5631 (slapd crashes due to assertion failure).
For us, doing so requires dumping and recreating two back-bdb databases since the OpenLDAP 2.4.x Debian packaging is linked against a newer version of BDB. The larger database contains about a million entries.
Instead of slapcat(8)/slapadd(8)ding the old databases, we're removing the existing databases and allowing slapd(8) to delta-syncrepl a copy from scratch. Ironing out this use case is especially important for us since we expect to be adding a number of consumers in the coming months and would obviously prefer to bring them online without having to shut down any other slapd instances for slapcat(8)ting. The Administrator's Guide seems to indicate this is an accepted use case, since its guide to bringing up a new consumer involves simply configuring the consumer and starting slapd.
When the consumer slapd comes up, it enters the refresh(?) phase and begins adding entries to the fresh, empty bdb database. When it finishes, contextCSN on the suffix entry is set to 20081111135024Z#000000#00#000000 (roughly when slapd was started) and this change is visible with ldapsearch(1).
At this point, slurpd seems to start processing the accesslog. The first entry references a nonexistent DN (uid=nava209,...) and the backend operation returns LDAP_NO_SUCH_OBJECT. This is interesting, since this entry was created months ago should have been found during the refresh phase and created. ldapsearch(1)ing against the provider with the same filter used by the consumer syncrepl ('(objectclass=*)') yields this entry, so it doesn't appear to be index corruption on the provider.
At this point, several hundred subsequent search entries are discarded; possibly due to the be_modify operation failing?
After some time, slapd continues processing entries and does so successfully until it encounters another error (a modrdn that returns LDAP_ALREADY_EXISTS since the accesslog entry that modrdn'd the existing object out of the way was ignored by the consumer). After a while, slapd starts processing the same batch of modifications again, and repeats until the retry counter is exhausted. contextCSN on the suffix entry is never updated during this process, based on debugging output and ldapsearch(1).
It's interesting that two consumers have successfully delta-syncrepl'd complete databases from scratch without experiencing this problem. At least four other consumer machines fail in this manner. There seems to be no rhyme or reason as to which machines succeed or fail; they're all running the same binaries, same OS release and patches, some are even on the same Ethernet segment as the provider. The provider slapd has been up consistently (without crash nor restart) during at least two attempts.
Syncrepl (level 16384) debug output, sans ~400Mbytes of entry processing during the refresh phase, is at:
http://horde.net/~jwm/slapd-syncrepl-debug
john