We're running MIT kerberos with the ldap backend, specifically 3 openldap servers doing delta syncrepl. We started having a problem a while back where once a day the kdc would time out authentication requests, and finally tracked it down to openldap purging the accesslog. We currently have the accesslog overlay configured to delete entries over 7 days old once a day, and it seems that while openldap is processing the purge the kdc is starved out and unable to process authentications in a timely fashion. We do (thanks to our ISO) have account lockout enabled, so every authentication involves not only a read but a write.
Is it expected for the accesslog purge to be so disruptive? Is there any way to tune it so it doesn't overwhelm the system to the point of being unresponsive?
Would it be better to purge the accesslog more frequently as to amortize the work across multiple intervals rather than being concentrated once a day?
Thanks for any suggestions...
Paul B. Henson wrote:
We're running MIT kerberos with the ldap backend, specifically 3 openldap servers doing delta syncrepl. We started having a problem a while back where once a day the kdc would time out authentication requests, and finally tracked it down to openldap purging the accesslog. We currently have the accesslog overlay configured to delete entries over 7 days old once a day, and it seems that while openldap is processing the purge the kdc is starved out and unable to process authentications in a timely fashion. We do (thanks to our ISO) have account lockout enabled, so every authentication involves not only a read but a write.
Is it expected for the accesslog purge to be so disruptive? Is there any way to tune it so it doesn't overwhelm the system to the point of being unresponsive?
Would it be better to purge the accesslog more frequently as to amortize the work across multiple intervals rather than being concentrated once a day?
Do you have an eq-index on the reqStart attribute as recommended in slapo-accesslog(5)?
Note that adding the index later needs re-indexing of the DB.
Ciao, Michael.
--On Wednesday, November 04, 2015 6:27 PM -0800 "Paul B. Henson" henson@acm.org wrote:
On Wed, Nov 04, 2015 at 07:46:47AM +0100, Michael Ströder wrote:
Do you have an eq-index on the reqStart attribute as recommended in slapo-accesslog(5)?
Yes:
index default eq index entryCSN,objectClass,reqEnd,reqResult,reqStart
I set up my accesslog to do the purges every 4 hours by default, rather than once a day, to get around this. You may want to do it more frequently than that. I would say once a day clearly isn't often enough for the amount of write traffic you have.
You also don't note what slapd backend you're using (bdb, hdb, mdb). bdb & hdb in particular are much slower, write wise, than mdb. And you don't note your OpenLDAP version, either...
--Quanah
--
Quanah Gibson-Mount Platform Architect Zimbra, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
From: Quanah Gibson-Mount Sent: Wednesday, November 04, 2015 6:34 PM
I set up my accesslog to do the purges every 4 hours by default, rather than once a day, to get around this. You may want to do it more frequently than that. I would say once a day clearly isn't often enough for the amount of write traffic you have.
I wasn't sure if doing it more frequently would amortize the load to the point where it did not impact production, or simply break production more often :). I guess I'll have to give it a try and see.
You also don't note what slapd backend you're using (bdb, hdb, mdb). bdb & hdb in particular are much slower, write wise, than mdb. And you don't note your OpenLDAP version, either...
Sorry, we are running the latest and greatest 2.4.41 with the latest and greatest mdb backend :).
Thanks…
"Paul B. Henson" henson@acm.org schrieb am 05.11.2015 um 03:27 in
Nachricht 20151105022754.GI3408@bender.unx.cpp.edu:
On Wed, Nov 04, 2015 at 07:46:47AM +0100, Michael Ströder wrote:
Do you have an eq-index on the reqStart attribute as recommended in slapo-accesslog(5)?
Yes:
index default eq index entryCSN,objectClass,reqEnd,reqResult,reqStart
I use the following additional indexes, but I run some specific queries on the database (my database is hdb): olcDbIndex: entryUUID eq olcDbIndex: reqResult eq olcDbIndex: reqDN eq olcDbIndex: reqMod sub olcDbIndex: reqType eq olcDbIndex: reqAuthzID eq
And I do not have an index on entryCSN.
Maybe you have an I/O bottleneck? Could you try (for a test) to put the accesslog into a RAM disk? What filesystem are you using? Special mount options?
Regards, Ulrich
From: Ulrich Windl Sent: Wednesday, November 04, 2015 11:26 PM
Maybe you have an I/O bottleneck? Could you try (for a test) to put the accesslog into a RAM disk? What filesystem are you using? Special mount options?
Yes, I'm pretty sure it is an I/O issue. The problem only occurs on the physical servers, the virtual machines (which are on a SAN with much better performance than local disks) don't exhibit this issue. While it is thrashing iotop shows the write load at about 2MB/s. It's on a linux system using ext4, the only special mount option is relatime.
What type of indexes do you have for your accesslog? Any warning about missing index in syslog?
"Paul B. Henson" henson@acm.org schrieb am 04.11.2015 um 04:14 in Nachricht
20151104031401.GH3408@bender.unx.cpp.edu:
We're running MIT kerberos with the ldap backend, specifically 3 openldap servers doing delta syncrepl. We started having a problem a while back where once a day the kdc would time out authentication requests, and finally tracked it down to openldap purging the accesslog. We currently have the accesslog overlay configured to delete entries over 7 days old once a day, and it seems that while openldap is processing the purge the kdc is starved out and unable to process authentications in a timely fashion. We do (thanks to our ISO) have account lockout enabled, so every authentication involves not only a read but a write.
Is it expected for the accesslog purge to be so disruptive? Is there any way to tune it so it doesn't overwhelm the system to the point of being unresponsive?
Would it be better to purge the accesslog more frequently as to amortize the work across multiple intervals rather than being concentrated once a day?
Thanks for any suggestions...
On Wed, Nov 04, 2015 at 09:08:47AM +0100, Ulrich Windl wrote:
What type of indexes do you have for your accesslog? Any warning about missing index in syslog?
The overall accesslog config is:
database mdb directory /var/lib/openldap-data/accesslog maxsize 2147483648 suffix cn=accesslog rootdn cn=accesslog
index default eq index entryCSN,objectClass,reqEnd,reqResult,reqStart
overlay accesslog logdb cn=accesslog logops writes logsuccess TRUE logpurge 07+00:00 01+00:00
I haven't seen any errors or warnings in the openldap logs; the only reason we noticed was the degraded kerberos performance.
Thanks...
On Wed, Nov 04 2015 at 18:32:50 -0800, Paul B. Henson scribbled in "Re: Antw: accesslog purge starves kerberos kdc authentications":
On Wed, Nov 04, 2015 at 09:08:47AM +0100, Ulrich Windl wrote:
What type of indexes do you have for your accesslog? Any warning about missing index in syslog?
The overall accesslog config is:
database mdb directory /var/lib/openldap-data/accesslog maxsize 2147483648 suffix cn=accesslog rootdn cn=accesslog
<SNIP>
I haven't seen any errors or warnings in the openldap logs; the only reason we noticed was the degraded kerberos performance.
Thanks...
Just a simple question, is /var/lib/openldap-data/accesslog on the same physical disk as the rest of your directory storage? I note from your initial thread on the kerberos list that there's small io spike at the same time, so it may be beneficial to have the accesslog on different spindles if possible.
Cheers.
Dameon.
From: Dameon Wagner Sent: Thursday, November 05, 2015 3:01 AM
Just a simple question, is /var/lib/openldap-data/accesslog on the same physical disk as the rest of your directory storage? I note from your initial thread on the kerberos list that there's small io spike at the same time, so it may be beneficial to have the accesslog on different spindles if possible.
Yes, it is. The system is a basic 1U server with a hardware RAID card and mirrored disks. I don't have any other spindles 8-/. One of my colleagues is giving me the big "I told you so", as he advocated upgrading to the hardware RAID card with a battery backed write cache which might have prevented this issue. The rest of us didn't think the extra expenditure was worth it for a Kerberos server which typically doesn't have very high performance requirements <sigh>. I guess I'm going to try purging the accesslog more frequently and seeing if that reduces the individual purge load to a low enough level that it doesn't impact service response.
Thanks.
openldap-technical@openldap.org