<quote who="Toby Blake">
Hi there,
Firstly, many thanks for the replies...
np.
Hi Toby.
For largely historical reasons we run slapd servers on most clients (this will probably change in the future - I'm just giving this information as background).
Why?
Why will this change or why did we do it in the first place? I wasn't party to these decisions at the time, so I can't really comment on the reasons for them. I could speculate wildly, but I'd prefer not to.
Understood.
We're seeing problems when some of these machines are busy, particularly, it seems, with memory intensive activity, although it's hard to substantiate as I generally only see the machines after they've broken. It's annoying as I can't reproduce these problems.
It's going to be hard to pin point then ;-) How much memory/CPU etc. do these clients have and what other services do they provide?
They're typically desktop or lab machines for academics, students, etc. Hardware-wise they're Dell desktop boxes of a few years old - a 2.4GHz processor with 512MB of memory is typical. Something I should have mentioned is that they're running Fedora Core 5, with a few running FC6.
OK.
As for what services they provide, general desktop services, but also could be running long-running or intensive jobs. Many of the machines are also in a condor pool and this does seem to cause more problems.
Do you know if slapd gets unhappy if other processes use up lots of memory? This is my current line of investigation - I'll try to make it unhappy by using increasing amounts of memory.
Yes.
I suppose what I'm trying to determine is - is it the client activity that's causing problems (i.e. a misbehaving client or similar) or is it slapd itself getting unhappy for other reasons (possibly due to resources being used by other programs)? Or a combination of both?
Probably both. If a client keeps sending lots of bind/search requests at once, slapd will queue/defer them.
We see quite a few problems with slapd getting into a state where it's deferring operations, for whatever reason - I think I understand these
- these are when slapd basically says sorry, I'm too busy doing X, so
I'll defer Y until I have time. Is this accurate?
Yes. What kind of clients are searching/binding to them? Local?
All local. As for what kind of clients - typical linux desktop activity I suppose. Hard to be specific about this really, as it will change from host to host.
OK.
Is this happening on all desktops then?
The second case I'm also seeing is bdb complaining about locks being no longer valid, e.g.
slapd[3780]: bdb(dc=inf,dc=ed,dc=ac,dc=uk): DB_LOCK->lock_put: Lock is no longer valid
slapd seems to keep going for the time being until getting into a state where it defers all binding operations and goes into some kind of spin where it sits at 99% cpu and has to be killed with a -9.
Is everything local? Nothing mounted locally, like NFS for the directory data.
Machines will have both NFS and AFS for home directory data.
Not the data directory then, ok.
I suppose I have a couple of questions about the "Lock is no longer valid" error....
- What causes it?
- Is it something I can prevent by configuration changes (for instance, would increasing the numbers of locks, lockers and objects help?)
One for the dev team. I do know this is an error message from Berkeley DB by grepping the source.
Yes, I saw it in the source, but don't know it well enough to be sure of what's causing it.
Likewise.
We're running openldap 2.3.35 with ITS#4924 and ITS#4925 patches with a bdb backend running 4.2.52 with all 6 recommended patches.
I hope you mean 5, as there are only 5 listed on the Oracle site.
As Quanah said, there are 6.
The only DBCONFIG settings we currently have are:
dbconfig set_cachesize 0 67108864 1 dbconfig set_lg_regionmax 262144 dbconfig set_lg_bsize 2097152
I take it dbconfig is a keyword you've added for this example, as it's not valid.
Sorry, I should have been more specific - this is in slapd.conf - look in the man page for slapd-bdb - this is just a way of getting directives into DB_CONFIG.
Yeah, my mistake. I forgot about that way.
Cheers Toby