On 17/11/2015 11:26, Andrew Findlay wrote:
On Tue, Nov 17, 2015 at 11:11:04AM +0000, Mark Cairney wrote:
Just as an update- we've managed to restore service. It turns out that we had went over the value of 65,535 (66,291) aliases which we think was the root cause of this behaviour suddenly starting.
It's a significant number certainly...
We're now down to "only" 41,000 :-)
Although it relates to MDB this ITS sounded very similar: http://www.openldap.org/its/index.cgi/Software%20Bugs?id=8146;page=10
We started deleting as many aliases as we could but performance only improved slightly. What appears to have fixed it was doing a slapcat of the "pruned" data and re-loading it into the database via slapadd. Having done this searches with deref set to always are now performing as they were before.
If this happens again, you could try stopping the server and running slapindex rather than reloading everything.
We did try slapindex but it had little effect. This may have been before we'd pruned the numbers of aliases however. It's been a fraught couple of days...
Ultimately we've been wanting to move away from both a) hdb and b) aliases for a while but one of our user bases runs a web application that requires them as it doesn't support either groups or modifying it's search filter. Given this incident there might be a push for them to re-evaluate this approach.
That does sound like a problematic app. There may be other ways of solving the problem if you have to keep it though. I would tend to look at having a separate instance of slapd to service it, and it might then be possible to use mapping overlays to build a view of your data that it can cope with. Does the app need to modify LDAP data or is it read-only?
We had suggested that the department run their own OpenLDAP server as a replica of our "main" central one and do some cleverness with overlays/rewrites/proxies to see a subset of the objects on our server. We do have a number of departments who have done this, either by taking a feed using a script or using syncrepl + stitching together their DIT using overlays/subordinate databases etc.
As far as I'm aware the application itself doesn't need to write back to LDAP but the Administrators need write access to create their object structure, add new users etc.
I think the first thing I'll do is enjoy the rest of my week off then look at setting up a sufficiently beefy testing VM to try and reproduce this behaviour with a view to submitting a proper bug report.
Thanks for your help with this.
Kind regards, Mark
Andrew