Hi all,
For largely historical reasons we run slapd servers on most clients (this will probably change in the future - I'm just giving this information as background). We're seeing problems when some of these machines are busy, particularly, it seems, with memory intensive activity, although it's hard to substantiate as I generally only see the machines after they've broken. It's annoying as I can't reproduce these problems.
We see quite a few problems with slapd getting into a state where it's deferring operations, for whatever reason - I think I understand these - these are when slapd basically says sorry, I'm too busy doing X, so I'll defer Y until I have time. Is this accurate?
The second case I'm also seeing is bdb complaining about locks being no longer valid, e.g.
slapd[3780]: bdb(dc=inf,dc=ed,dc=ac,dc=uk): DB_LOCK->lock_put: Lock is no longer valid
slapd seems to keep going for the time being until getting into a state where it defers all binding operations and goes into some kind of spin where it sits at 99% cpu and has to be killed with a -9.
I suppose I have a couple of questions about the "Lock is no longer valid" error....
- What causes it? - Is it something I can prevent by configuration changes (for instance, would increasing the numbers of locks, lockers and objects help?)
We're running openldap 2.3.35 with ITS#4924 and ITS#4925 patches with a bdb backend running 4.2.52 with all 6 recommended patches.
The only DBCONFIG settings we currently have are:
dbconfig set_cachesize 0 67108864 1 dbconfig set_lg_regionmax 262144 dbconfig set_lg_bsize 2097152
Thanks in advance Toby Blake School of Informatics University of Edinburgh