Jeffrey Crawford wrote:
On Wed, Nov 16, 2011 at 1:27 PM, Howard Chu <hyc@symas.com mailto:hyc@symas.com> wrote: Jeffrey Crawford wrote: On Wed, Nov 16, 2011 at 7:40 AM, Jeffrey Crawford<jeffreyc@ucsc.edu mailto:jeffreyc@ucsc.edu> wrote: On Wed, Nov 16, 2011 at 12:09 AM, Howard Chu<hyc@symas.com mailto:hyc@symas.com> wrote: Jeffrey Crawford wrote: I'm trying to stabilize our openldap server farm before going live and am finding that despite the contextCSN matching between providers and replicas, the actual content of the server is getting out of sync. This is most prominent when we are testing our population routine and we need to remove all accounts before starting. right now it's only about 22000 entries (It will get much larger).
During the mass delete we got the following sprinkled throughout the logs on all machines: ==== Nov 15 15:47:16 idm-prod-ldap-2 slapd[33070]: bdb(dc=domain,dc=name): previous transaction deadlock return not resolved Wow. I've never seen this error message before. What version of OpenLDAP and BerkeleyDB are you using? FreeBSD 8.2 with openldap 2.4.26, however like I mentioned before, right now I think we are squeezing ram right now Part of this deployment was to discover how much ram we needed on the virtual machine and it was started pretty low. Oh and we are using bdb 4.6 right now (forgot to answer that) Running out of memory would cause an obvious error message ("no memory") so that's not likely to be the problem here. Might be worth upgrading to at least BDB 4.8, but again, never having seen BDB spit out that error before, that's just a guess.
Not sure if this is significant but I'm been noticing that this error only shows up on deletes. However it also shows up on deletes on the machine I'm running the ldapdelete against. So perhaps this is more of a software issue. I'll go ahead and run this with more ram and I'll check with the sysadmin if they can compile it against bdb 4.8 and see if that changes anything. But I don't think ITS#7052 applies here because the machine I'm doing this against does not use syncrepl, its the provider to others.
This is a machine on a VM. Are there any known issues with that?
Way back in the dawn of time, there were some VMware implementations that didn't support mutexes correctly. I don't think that's been an issue for many years. There ought to be other error messages in your log, immediately preceding the one you quoted. Post those too.