Hi,
We're having severe performance issues for any query with alias dereferencing set to "always".
Any query with this causes the CPU to spin up to 100% and if we have a number of these concurrently the machine will become unresponsive.
We're using OpenLDAP 2.4.42 with the old hdb backend.
We do have a large number of aliases (~63,000). Could this be the cause?
Our olcMaxDerefDepth is currently set to "1"
On Mon, Nov 16, 2015 at 03:13:11PM +0000, Mark Cairney wrote:
We're having severe performance issues for any query with alias dereferencing set to "always".
Any query with this causes the CPU to spin up to 100% and if we have a number of these concurrently the machine will become unresponsive.
I hit something similar a while ago using mdb:
http://www.openldap.org/its/index.cgi/Software%20Bugs?id=8146
We're using OpenLDAP 2.4.42 with the old hdb backend.
We do have a large number of aliases (~63,000). Could this be the cause?
It would be worth checking that you have indexed the objectclass attribute.
I prefer to avoid aliases...
Andrew
Hi Andrew,
Thanks for getting back. I saw your report for mdb actually. I can confirm that I've got "olcDBIndex objectlass eq" set on my servers.
Everyone keeps telling me that about aliases but unfortunately we've got a group of users who require them to act in lieu of groups to support their application i.e. they have OUs filled with aliases back to user accounts in the main user OU.
We've started deleting old/hanging OUs and it's made a small improvement but it's still taking 20-30s per query rather than returning almost instantly like it was before.
On 16/11/15 18:10, Andrew Findlay wrote:
On Mon, Nov 16, 2015 at 03:13:11PM +0000, Mark Cairney wrote:
We're having severe performance issues for any query with alias dereferencing set to "always".
Any query with this causes the CPU to spin up to 100% and if we have a number of these concurrently the machine will become unresponsive.
I hit something similar a while ago using mdb:
http://www.openldap.org/its/index.cgi/Software%20Bugs?id=8146
We're using OpenLDAP 2.4.42 with the old hdb backend.
We do have a large number of aliases (~63,000). Could this be the cause?
It would be worth checking that you have indexed the objectclass attribute.
I prefer to avoid aliases...
Andrew
Hi,
Just as an update- we've managed to restore service. It turns out that we had went over the value of 65,535 (66,291) aliases which we think was the root cause of this behaviour suddenly starting.
Although it relates to MDB this ITS sounded very similar: http://www.openldap.org/its/index.cgi/Software%20Bugs?id=8146;page=10
We started deleting as many aliases as we could but performance only improved slightly. What appears to have fixed it was doing a slapcat of the "pruned" data and re-loading it into the database via slapadd. Having done this searches with deref set to always are now performing as they were before.
Ultimately we've been wanting to move away from both a) hdb and b) aliases for a while but one of our user bases runs a web application that requires them as it doesn't support either groups or modifying it's search filter. Given this incident there might be a push for them to re-evaluate this approach.
On 16/11/15 18:44, Mark Cairney wrote:
Hi Andrew,
Thanks for getting back. I saw your report for mdb actually. I can confirm that I've got "olcDBIndex objectlass eq" set on my servers.
Everyone keeps telling me that about aliases but unfortunately we've got a group of users who require them to act in lieu of groups to support their application i.e. they have OUs filled with aliases back to user accounts in the main user OU.
We've started deleting old/hanging OUs and it's made a small improvement but it's still taking 20-30s per query rather than returning almost instantly like it was before.
On 16/11/15 18:10, Andrew Findlay wrote:
On Mon, Nov 16, 2015 at 03:13:11PM +0000, Mark Cairney wrote:
We're having severe performance issues for any query with alias dereferencing set to "always".
Any query with this causes the CPU to spin up to 100% and if we have a number of these concurrently the machine will become unresponsive.
I hit something similar a while ago using mdb:
http://www.openldap.org/its/index.cgi/Software%20Bugs?id=8146
We're using OpenLDAP 2.4.42 with the old hdb backend.
We do have a large number of aliases (~63,000). Could this be the cause?
It would be worth checking that you have indexed the objectclass attribute.
I prefer to avoid aliases...
Andrew
On Tue, Nov 17, 2015 at 11:11:04AM +0000, Mark Cairney wrote:
Just as an update- we've managed to restore service. It turns out that we had went over the value of 65,535 (66,291) aliases which we think was the root cause of this behaviour suddenly starting.
It's a significant number certainly...
Although it relates to MDB this ITS sounded very similar: http://www.openldap.org/its/index.cgi/Software%20Bugs?id=8146;page=10
We started deleting as many aliases as we could but performance only improved slightly. What appears to have fixed it was doing a slapcat of the "pruned" data and re-loading it into the database via slapadd. Having done this searches with deref set to always are now performing as they were before.
If this happens again, you could try stopping the server and running slapindex rather than reloading everything.
Ultimately we've been wanting to move away from both a) hdb and b) aliases for a while but one of our user bases runs a web application that requires them as it doesn't support either groups or modifying it's search filter. Given this incident there might be a push for them to re-evaluate this approach.
That does sound like a problematic app. There may be other ways of solving the problem if you have to keep it though. I would tend to look at having a separate instance of slapd to service it, and it might then be possible to use mapping overlays to build a view of your data that it can cope with. Does the app need to modify LDAP data or is it read-only?
Andrew
On 17/11/2015 11:26, Andrew Findlay wrote:
On Tue, Nov 17, 2015 at 11:11:04AM +0000, Mark Cairney wrote:
Just as an update- we've managed to restore service. It turns out that we had went over the value of 65,535 (66,291) aliases which we think was the root cause of this behaviour suddenly starting.
It's a significant number certainly...
We're now down to "only" 41,000 :-)
Although it relates to MDB this ITS sounded very similar: http://www.openldap.org/its/index.cgi/Software%20Bugs?id=8146;page=10
We started deleting as many aliases as we could but performance only improved slightly. What appears to have fixed it was doing a slapcat of the "pruned" data and re-loading it into the database via slapadd. Having done this searches with deref set to always are now performing as they were before.
If this happens again, you could try stopping the server and running slapindex rather than reloading everything.
We did try slapindex but it had little effect. This may have been before we'd pruned the numbers of aliases however. It's been a fraught couple of days...
Ultimately we've been wanting to move away from both a) hdb and b) aliases for a while but one of our user bases runs a web application that requires them as it doesn't support either groups or modifying it's search filter. Given this incident there might be a push for them to re-evaluate this approach.
That does sound like a problematic app. There may be other ways of solving the problem if you have to keep it though. I would tend to look at having a separate instance of slapd to service it, and it might then be possible to use mapping overlays to build a view of your data that it can cope with. Does the app need to modify LDAP data or is it read-only?
We had suggested that the department run their own OpenLDAP server as a replica of our "main" central one and do some cleverness with overlays/rewrites/proxies to see a subset of the objects on our server. We do have a number of departments who have done this, either by taking a feed using a script or using syncrepl + stitching together their DIT using overlays/subordinate databases etc.
As far as I'm aware the application itself doesn't need to write back to LDAP but the Administrators need write access to create their object structure, add new users etc.
I think the first thing I'll do is enjoy the rest of my week off then look at setting up a sufficiently beefy testing VM to try and reproduce this behaviour with a view to submitting a proper bug report.
Thanks for your help with this.
Kind regards, Mark
Andrew
Andrew Findlay wrote:
If this happens again, you could try stopping the server and running slapindex rather than reloading everything.
IIRC depending on the data complete reload with slapadd can be faster than slapindex. I vaguely remember Quanah reporting test results with back-hdb a couple of years ago. Not sure about back-mdb nowadays though.
Ciao, Michael.
Michael Ströder wrote:
Andrew Findlay wrote:
If this happens again, you could try stopping the server and running slapindex rather than reloading everything.
IIRC depending on the data complete reload with slapadd can be faster than slapindex. I vaguely remember Quanah reporting test results with back-hdb a couple of years ago. Not sure about back-mdb nowadays though.
slapindex on back-mdb is faster than slapadd. But, for the problem being discussed here, slapindex is inadequate; you need a full reload with slapadd.
On Tue, Nov 17, 2015 at 06:02:36PM +0000, Howard Chu wrote:
slapindex on back-mdb is faster than slapadd. But, for the problem being discussed here, slapindex is inadequate; you need a full reload with slapadd.
Useful to know, thanks.
What about the root cause - is it likely that 64k aliases would trigger a problem, or is something else going on here?
Andrew
Andrew Findlay wrote:
On Tue, Nov 17, 2015 at 06:02:36PM +0000, Howard Chu wrote:
slapindex on back-mdb is faster than slapadd. But, for the problem being discussed here, slapindex is inadequate; you need a full reload with slapadd.
Useful to know, thanks.
What about the root cause - is it likely that 64k aliases would trigger a problem, or is something else going on here?
http://www.openldap.org/lists/openldap-bugs/200410/msg00001.html
openldap-technical@openldap.org