Are there any chances to have your feedback on the information provided?
Best regards, Ioan
On Fri, May 18, 2012 at 11:25 AM, Ioan Indreias indreias@gmail.com wrote:
Hello all,
We are running OpenLDAP in a SLAVE-MASTER configuratiorn (Slave for another OpenLDAP server //L1// and Master for 2 Axigen Back End nodes //B1 and B2//).
Currently we have ~31000 accounts and experience random freeze events of the slapd (nothing writing in the logs, CPU 0%) when a series of add/delete operations are performed over the normal LDAP traffic.Random means an interval between 15 and 80 minutes after we start add/delete accounts.
The add/delete bash script is running in loop, adding 60 accounts, sleep 20 sec, delete the accounts and sleep another 40 sec. All operations are performed on the MASTER L1 server and are propagated correctly to our server (till is freezing of course).
Running with 2.4.23 we have considered to upgrade to 2.4.31 but the same problem was observed. On both cases we have used BDB 4.7.25, with all available patches.
The server is a RHEL 5.5, 64 bits with plenty of memory and CPUs.
I am attaching 3 archives: the slapd configuration and gdb+db_stat for 2 freeeze events.
Could somebody point if there is a slapd or bdb problem? Any hints are appreciated.
Best regards, Ioan Indreias www.modulo.ro
=== May 18 09:37:10 saevrvmp11 slapd[13834]: @(#) $OpenLDAP: slapd 2.4.31 (May 10 2012 16:11:06) $ root@host64.modulo.ro:/root/openldap-2.4.31/servers/slapd
[root@host tmp]# ldd /usr/sbin/slapd libuuid.so.1 => /lib64/libuuid.so.1 (0x0000003efe000000) libssl.so.6 => /lib64/libssl.so.6 (0x0000003f05400000) libcrypto.so.6 => /lib64/libcrypto.so.6 (0x0000003f00800000) libresolv.so.2 => /lib64/libresolv.so.2 (0x0000003f01800000) libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003efb800000) libc.so.6 => /lib64/libc.so.6 (0x0000003efac00000) /lib64/ld-linux-x86-64.so.2 (0x0000003efa800000) libgssapi_krb5.so.2 => /usr/lib64/libgssapi_krb5.so.2 (0x0000003f02800000) libkrb5.so.3 => /usr/lib64/libkrb5.so.3 (0x0000003f03000000) libcom_err.so.2 => /lib64/libcom_err.so.2 (0x0000003f01000000) libk5crypto.so.3 => /usr/lib64/libk5crypto.so.3 (0x0000003f02c00000) libdl.so.2 => /lib64/libdl.so.2 (0x0000003efb000000) libz.so.1 => /usr/lib64/libz.so.1 (0x0000003efbc00000) libkrb5support.so.0 => /usr/lib64/libkrb5support.so.0 (0x0000003f02400000) libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x0000003f02000000) libselinux.so.1 => /lib64/libselinux.so.1 (0x0000003efc400000) libsepol.so.1 => /lib64/libsepol.so.1 (0x0000003efc000000) ===
--On May 24, 2012 5:52:30 PM +0300 Ioan Indreias indreias@gmail.com wrote:
Are there any chances to have your feedback on the information provided?
Best regards, Ioan
What are your checkpoint settings in slapd.conf/slapd-config?
Have you validated your lock/locker/lock objects aren't being exhausted in the BDB environment?
--Quanah
On Thu, May 24, 2012 at 7:24 PM, Quanah Gibson-Mount quanah@zimbra.com wrote:
--On May 24, 2012 5:52:30 PM +0300 Ioan Indreias indreias@gmail.com wrote:
Hello,
Thanks a lot for your answer. Please find below the requested info. Should I re-attach the archives with the config files, db_stat and gdb output?
Thanks, Ioan
What are your checkpoint settings in slapd.conf/slapd-config?
+++ from slapd.conf +++ database bdb checkpoint 32 30
Have you validated your lock/locker/lock objects aren't being exhausted in the BDB environment?
+++ from db_stat of the main DB // set_1 +++ Default locking region information: 260 Last allocated locker ID 0x7fffffff Current maximum unused locker ID 9 Number of lock modes 1000 Maximum number of locks possible 1000 Maximum number of lockers possible 1000 Maximum number of lock objects possible 160 Number of lock object partitions 53 Number of current locks 928 Maximum number of locks at any one time 18 Maximum number of locks in any one bucket 349 Maximum number of locks stolen by for an empty partition 19 Maximum number of locks stolen for any one partition 119 Number of current lockers 159 Maximum number of lockers at any one time 20 Number of current lock objects 487 Maximum number of lock objects at any one time 6 Maximum number of lock objects in any one bucket 0 Maximum number of objects stolen by for an empty partition 0 Maximum number of objects stolen for any one partition 71M Total number of locks requested (71133852) 71M Total number of locks released (71133450) 0 Total number of locks upgraded 30 Total number of locks downgraded 86 Lock requests not available due to conflicts, for which we waited 0 Lock requests not available due to conflicts, for which we did not wait 0 Number of deadlocks 0 Lock timeout value 0 Number of locks that have timed out 0 Transaction timeout value 0 Number of transactions that have timed out 752KB The size of the lock region 2257 The number of partition locks that required waiting (0%) 725 The maximum number of times any partition lock was waited for (0%) 0 The number of object queue operations that required waiting (0%) 1145 The number of locker allocations that required waiting (0%) 2 The number of region locks that required waiting (0%) 6 Maximum hash bucket length
+++ from db_stat of the main DB // set_2 +++ Default locking region information: 297 Last allocated locker ID 0x7fffffff Current maximum unused locker ID 9 Number of lock modes 1000 Maximum number of locks possible 1000 Maximum number of lockers possible 1000 Maximum number of lock objects possible 160 Number of lock object partitions 53 Number of current locks 883 Maximum number of locks at any one time 18 Maximum number of locks in any one bucket 253 Maximum number of locks stolen by for an empty partition 20 Maximum number of locks stolen for any one partition 122 Number of current lockers 122 Maximum number of lockers at any one time 20 Number of current lock objects 469 Maximum number of lock objects at any one time 5 Maximum number of lock objects in any one bucket 3 Maximum number of objects stolen by for an empty partition 2 Maximum number of objects stolen for any one partition 50M Total number of locks requested (50508958) 50M Total number of locks released (50508652) 0 Total number of locks upgraded 46 Total number of locks downgraded 36 Lock requests not available due to conflicts, for which we waited 0 Lock requests not available due to conflicts, for which we did not wait 0 Number of deadlocks 0 Lock timeout value 0 Number of locks that have timed out 0 Transaction timeout value 0 Number of transactions that have timed out 752KB The size of the lock region 1610 The number of partition locks that required waiting (0%) 391 The maximum number of times any partition lock was waited for (0%) 0 The number of object queue operations that required waiting (0%) 854 The number of locker allocations that required waiting (0%) 3 The number of region locks that required waiting (0%) 9 Maximum hash bucket length
--Quanah
-- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc
Zimbra :: the leader in open source messaging and collaboration
--On May 24, 2012 9:15:11 PM +0300 Ioan Indreias indreias@gmail.com wrote:
On Thu, May 24, 2012 at 7:24 PM, Quanah Gibson-Mount quanah@zimbra.com wrote:
--On May 24, 2012 5:52:30 PM +0300 Ioan Indreias indreias@gmail.com wrote:
Hello,
Thanks a lot for your answer. Please find below the requested info.
+++ from db_stat of the main DB // set_1 +++ Default locking region information: 260 Last allocated locker ID 0x7fffffff Current maximum unused locker ID 9 Number of lock modes 1000 Maximum number of locks possible 928 Maximum number of locks at any one time
You are certainly running close to running out of locks. You should likely increase them for both databases to 3000 or so.
--Quanah
On Fri, May 25, 2012 at 6:42 PM, Quanah Gibson-Mount quanah@zimbra.com wrote:
You are certainly running close to running out of locks. You should likely increase them for both databases to 3000 or so.
I have moved to 3000 (stop ldap, db_recover, start ldap) and restart our add/delete script.
+++ set_lk_max_objects 3000 set_lk_max_locks 3000 set_lk_max_lockers 3000 +++
Unfortunately after only 3 add/delete commands the ldap freeze again. This was the shortest time the ldap ran (after a fresh restart) before it freeze.
We have restarted the ldap and start a new test. This time the period was longer (near 45 minutes).
Please find attached 2 tar archives with gbd and db_stat information (set_3, respectively set_4), both for slapd 2.4.23 and BDB 4.7.25
Best regards, Ioan
--On May 28, 2012 11:09:53 AM +0300 Ioan Indreias indreias@gmail.com wrote:
On Fri, May 25, 2012 at 6:42 PM, Quanah Gibson-Mount quanah@zimbra.com wrote:
You are certainly running close to running out of locks. You should likely increase them for both databases to 3000 or so.
I have moved to 3000 (stop ldap, db_recover, start ldap) and restart our add/delete script.
+++ set_lk_max_objects 3000 set_lk_max_locks 3000 set_lk_max_lockers 3000 +++
Unfortunately after only 3 add/delete commands the ldap freeze again. This was the shortest time the ldap ran (after a fresh restart) before it freeze.
We have restarted the ldap and start a new test. This time the period was longer (near 45 minutes).
Please find attached 2 tar archives with gbd and db_stat information (set_3, respectively set_4), both for slapd 2.4.23 and BDB 4.7.25
Is BDB actually patched? There are 4 patches for BDB 4.7.25. Also, please use OpenLDAP 2.4.31 before reporting any more issues.
--Quanah
On Mon, May 28, 2012 at 11:23 PM, Quanah Gibson-Mount quanah@zimbra.com wrote:
--On May 28, 2012 11:09:53 AM +0300 Ioan Indreias indreias@gmail.com wrote:
On Fri, May 25, 2012 at 6:42 PM, Quanah Gibson-Mount quanah@zimbra.com wrote:
You are certainly running close to running out of locks. You should likely increase them for both databases to 3000 or so.
I have moved to 3000 (stop ldap, db_recover, start ldap) and restart our add/delete script.
+++ set_lk_max_objects 3000 set_lk_max_locks 3000 set_lk_max_lockers 3000 +++
Unfortunately after only 3 add/delete commands the ldap freeze again. This was the shortest time the ldap ran (after a fresh restart) before it freeze.
We have restarted the ldap and start a new test. This time the period was longer (near 45 minutes).
Please find attached 2 tar archives with gbd and db_stat information (set_3, respectively set_4), both for slapd 2.4.23 and BDB 4.7.25
Is BDB actually patched? There are 4 patches for BDB 4.7.25. Also, please use OpenLDAP 2.4.31 before reporting any more issues.
Yes, the BDB have been patched. The problem was found on 2.4.31 also - please find attached the archive sent in my first message. In case a new set is needed (for 2.4.31 and the new value for the lockers) I'll try to provide it asap.
Ioan
--Quanah
-- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc
Zimbra :: the leader in open source messaging and collaboration
--On Tuesday, May 29, 2012 2:30 AM +0300 Ioan Indreias indreias@gmail.com wrote:
Is BDB actually patched? There are 4 patches for BDB 4.7.25. Also, please use OpenLDAP 2.4.31 before reporting any more issues.
Yes, the BDB have been patched. The problem was found on 2.4.31 also - please find attached the archive sent in my first message. In case a new set is needed (for 2.4.31 and the new value for the lockers) I'll try to provide it asap.
How do you know BDB has been patched with all *4* patches?
As for upping the lockers, you need to do that regardless of the OpenLDAP version. That behavior isn't going to change just because you upgraded the OpenLDAP version -- It is a BDB usage issue.
The gdb trace clearly shows your slapd is locked up at the BDB level. Given that I have used BDB 4.7.25+all 4 patches for years without issue, I know a properly patched BDB doesn't have this issue.
One other thing I would note is that you have failed to provide your OpenLDAP configuration (either slapd.conf for a slapcat -n 0 of your cn=config DB).
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
openldap-technical@openldap.org