Hello,
I have a pair of OpenLDAP servers that had been replicating flawlessly with delta syncRepl for about 10 months. Just the other day, I saw that modifications were no longer being replicated and these messages were appearing in the syslog on the master server immediately after the MOD line:
[ID 651871 local0.debug] => bdb_idl_insert_key: c_get next_dup failed: DB_NOTFOUND: No matching key/data pair found (-30990) [ID 809268 local0.debug] => bdb_dn2id_add: parent (cn=log) insert failed: -30990
I assume that something has become corrupted in the BDB database for cn=log on the master. Does that seem correct? I'm definitely not seeing any new entries in the cn=log database since those messages began appearing.
If it is a corrupted index, I think that running "slapindex -b cn=log -f .... " after stopping the slapd process will fix that. After that completes, I should be able to restart the slapd and test that writes to entries under the baseDN do cause new entries to appear in the cn=log database.
If it's not an index, I have no idea how to repair this. I found the error message in the sources (servers/slapd/back-bdb/idl.c:789 in version 2.3.30) but honestly, I have no idea what that code is doing.
Once (if) I can repair things, I can begin worrying about getting changes to the replica again. Since there are changes missing from the cn=log database on the master, I assume that I'll need to cause a complete re-sync. Is there a better way to accomplish that than removing the entire database on the replica, using slapadd to import a recent backup of the master, and restarting the replica?
Some specifics in case they matter:
Master: Solaris10 amd64 BDB 4.2.52 + 5 patches OpenLDAP 2.3.30
Replica: Solaris10 amd64 BDB 4.2.52 + 5 patches OpenLDAP 2.3.38 (upgraded from 2.3.33 the day before the problem began on the Master)
(What I believe to be the) Relevant portions of slapd.conf file from the Master (slightly obfuscated) are included at the end of this message.
Thank you for any help,
-Ben
# access log database (used by syncprov-delta replication) database bdb suffix "cn=log" directory /var/openldap/data/prod/logdb rootdn "cn=Manager,dc=our,dc=domain" mode 0660 shm_key 142 index default eq index objectClass,entryUUID,entryCSN eq index reqStart,reqEnd,reqResult,reqType eq access to dn.subtree="cn=log" by group.exact="cn=DirectoryAdmins,cn=administrators,dc=our,dc=domain" write by dn.onelevel="cn=SyncUsers,cn=administrators,dc=our,dc=domain" read by * none
overlay syncprov syncprov-nopresent TRUE syncprov-reloadhint TRUE
# This is all one line limits dn.onelevel="cn=SyncUsers,cn=administrators,dc=our,dc=domain" time.soft=unlimited time.hard=unlimited size.soft=unlimited size.hard=unlimited
database hdb suffix "dc=our,dc=domain" rootdn "cn=manager,dc=our,dc=domain" rootpw {SHA}[XXX REMOVED XXX] directory /var/openldap/data/prod/db checkpoint 100000 30 mode 0660 shm_key 42 cachesize 500000 idlcacheSize 1500000 index default pres,eq index givenName,description,uid,cn,sn pres,eq,sub index objectClass,uniqueMember,member eq index employeeNumber eq,sub index entryCSN,entryUUID eq
overlay ppolicy ppolicy_default cn=standard,cn=policies,dc=our,dc=domain
overlay dynlist dynlist-attrset groupOfURLs memberURL member
overlay syncprov syncprov-checkpoint 100000 30 syncprov-sessionlog 300000
overlay accesslog logdb cn=log logops writes logsuccess TRUE logold (objectClass=inetOrgPerson) logpurge 28+00:00 01+00:00
# This is all one line limits dn.onelevel="cn=SyncUsers,cn=administrators,dc=our,dc=domain" time.soft=unlimited time.hard=unlimited size.soft=unlimited size.hard=unlimited
--On August 31, 2007 1:41:46 PM -0400 Benjamin Lewis bhlewis@gmail.com wrote:
Hello,
I have a pair of OpenLDAP servers that had been replicating flawlessly with delta syncRepl for about 10 months. Just the other day, I saw that modifications were no longer being replicated and these messages were appearing in the syslog on the master server immediately after the MOD line:
Some specifics in case they matter:
Master: Solaris10 amd64 BDB 4.2.52 + 5 patches OpenLDAP 2.3.30
Fixed in OpenLDAP 2.3.35:
Fixed slapd syncrepl delta-sync modlist free (ITS#4904)
I suggest you upgrade your master, given that 2.3.30 is already known to have an issue with delta-sync. Also, rather than just stopping slapd and running slapindex, you may want to stop slapd, go to the cn=log db, and remove the index bdb files (keeping the database bdb files, like id2entry.bdb and dn2id.bdb), and then run slapindex, so that the old index files are completely purged. I'll also note that 2.3.38 is the current stable release, so there are many other good reasons to upgrade. ;)
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Quanah,
You wrote:
--On August 31, 2007 1:41:46 PM -0400 Benjamin Lewis bhlewis@gmail.com wrote:
Hello,
I have a pair of OpenLDAP servers that had been replicating flawlessly with delta syncRepl for about 10 months. Just the other day, I saw that modifications were no longer being replicated and these messages were appearing in the syslog on the master server immediately after the MOD line:
Some specifics in case they matter:
Master: Solaris10 amd64 BDB 4.2.52 + 5 patches OpenLDAP 2.3.30
Fixed in OpenLDAP 2.3.35:
Fixed slapd syncrepl delta-sync modlist free (ITS#4904)
I suggest you upgrade your master, given that 2.3.30 is already known to have an issue with delta-sync. Also, rather than just stopping slapd and running slapindex, you may want to stop slapd, go to the cn=log db, and remove the index bdb files (keeping the database bdb files, like id2entry.bdb and dn2id.bdb), and then run slapindex, so that the old index files are completely purged. I'll also note that 2.3.38 is the current stable release, so there are many other good reasons to upgrade. ;)
Thank you for the advice. An upgrade to 2.3.38 was already planned for other reasons but now I have good reason to push up the timeline.
Thanks again,
-Ben
Benjamin Lewis skrev, on 31-08-2007 20:27:
[..]
Fixed in OpenLDAP 2.3.35:
Fixed slapd syncrepl delta-sync modlist free (ITS#4904)
I suggest you upgrade your master, given that 2.3.30 is already known to have an issue with delta-sync. Also, rather than just stopping slapd and running slapindex, you may want to stop slapd, go to the cn=log db, and remove the index bdb files (keeping the database bdb files, like id2entry.bdb and dn2id.bdb), and then run slapindex, so that the old index files are completely purged. I'll also note that 2.3.38 is the current stable release, so there are many other good reasons to upgrade. ;)
Thank you for the advice. An upgrade to 2.3.38 was already planned for other reasons but now I have good reason to push up the timeline.
This with Red Hat RHL5 and Fedora FC6:
Even with OL 2.3.38 at two different sites I experienced the same problem that caused much head-banging until I finally found out what was happening.
The delta-syncrepl directory was becoming corrupted and all MOD/DEL/MODRDN (all updates) were hanging slapd. When I (on my test machine) deleted the whole directory content - apart from DB_CONFIG - and restarted slapd, the content was recreated and things went back to normal. Doing a db_recover -c hadn't had any effect ...
--Tonni
openldap-software@openldap.org