Hi Dusan,
On Fri, 28 Jun 2013, dusan.fric@t-systems.sk wrote:
Hi Chu,
thanks for your quick response. We will try to contact Oracle to advice here.
However, what is your suggestion here ' try different settings for the deadlock detector' exactly?
BTW - certainly, we are patched DB 4.7.25 (all 4 patches)...
In case you can update to 2.4.35 you might want to test the mdb backend with your workload.
Especially if you can easily reproduce the problem.
Greetings Christian
Regards, Dusan
-----Original Message----- From: Howard Chu [mailto:hyc@symas.com] Sent: 26. j??na 2013 13:32 To: Fric, Dusan; openldap-its@openldap.org Subject: Re: (ITS#7633) Slapd hangs on hdb write lock
dusan.fric@t-systems.sk wrote:
Full_Name: Dusan Fric Version: 2.4.32 OS: RHEL 5.7 x64 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (88.212.40.139)
We are experiencing frequent hangs in slapd. Once hung we usualy cannot continue to connect until we kill -9 the slapd process and restart it. The directory is used for 2 applications as user eDir and we are using it in production over 1 month - we have noted the busier the directory becomes the more often it hangs (now twice per week).
We have installed identical configurations on 3 environments, each has only one single server (no replication). There are 30k entries in the directory (production).
We are running:
RHEL 5.7 x64 (VMWare with NFS mountpoints) OpenLDAP 2.4.32 Berkeley DB 4.8.30 (4.7.25)
We were starting with DB 4.8.30, after downgrade to version 4.7.25 (according to ITS#7378 - https://www.openldap.org/its/index.cgi/Incoming?id=7378;selectid=7378) we are facing the same issue.
The problem occurs when two requests simultaneously try to update attributes of a record and in most cases of the same DN value. We can easily reproduce it with a java test program running 2 threads each connecting to the ldap server and updating the record for a particular DN value. It need not be the same DN value but DN values which reside on the same BDB page.
Thanks for the traces, but the information you've provided shows that this is a BDB problem, not an OpenLDAP bug. You'll have to contact Oracle for help. The BDB deadlock detector is supposed to handle these situations.
For example in your 4.8.30 lock status there are only 2 conflicting transactions 80000028 and 80000029 and they are only contending on a single page. This is one of the most elementary deadlock situations.
You might try different settings for the deadlock detector, but ultimately this appears to be a BDB bug and the choice of detector shouldn't matter.
Configuration
olc cn=config (part): dn: cn=config objectClass: olcGlobal cn: config olcConcurrency: 0 olcConnMaxPending: 100 olcConnMaxPendingAuth: 1000 olcSockbufMaxIncoming: 262143 olcSockbufMaxIncomingAuth: 16777215 olcThreads: 32
olc DB hdb config (part): dn: olcDatabase={1}hdb,cn=config objectClass: olcHdbConfig objectClass: olcDatabaseConfig olcDatabase: {1}hdb olcDbCacheSize: 80000 olcDbCheckpoint: 128 5 olcDbConfig: {0}set_cachesize 0 268435456 1 olcDbConfig: {1}set_lg_max 10485760 olcDbConfig: {2}set_lg_bsize 2097152 olcDbConfig: {3}set_lg_dir /pkg/openldap/dblog olcDbConfig: {4}set_lg_regionmax 262144 olcDbConfig: {5}set_lk_detect DB_LOCK_EXPIRE olcDbConfig: {6}set_flags DB_TXN_NOSYNC olcDbDirtyRead: FALSE olcDbDNcacheSize: 0 olcDbIDLcacheSize: 240000 olcDbIndex: default eq,sub olcDbIndex: objectClass eq olcDbIndex: cn eq,sub olcDbIndex: uid eq,sub olcDbIndex: mail pres,eq olcDbIndex: sn eq,sub olcDbIndex: member eq olcDbLinearIndex: FALSE olcDbMode: 0600 olcDbNoSync: TRUE olcDbSearchStack: 20 olcDbShmKey: 0 olcLastMod: TRUE olcMaxDerefDepth: 15 olcMonitoring: TRUE olcReadOnly: FALSE
We have managed to collect db_stat lock information, which indicates the same issue with DB write locks on both DB versions.
db_stat -C ol (4.8.30)
Lock REGINFO information: Lock Region type 5 Region ID /apps/DECCLASA-1/data/openldap/__db.005 Region name 0x2ab41cdc3000 Region address 0x2ab41cdc3138 Region primary address 0 Region maximum allocation 0 Region allocated Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest REGION_JOIN_OK Region flags =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Locks grouped by lockers: Locker Mode Count Status ----------------- Object --------------- 1 dd= 0 locks held 1 write locks 0 pid/thread 30034/47555767609232 1 READ 1 HELD id2entry.bdb handle 0 2 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232 3 dd= 0 locks held 1 write locks 0 pid/thread 30034/47555767609232 3 READ 1 HELD dn2id.bdb handle 0 4 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 5 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232 6 dd= 0 locks held 0 write locks 0 pid/thread 30034/1139579200 7 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 8 dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496 9 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792 9 READ 1 HELD objectClass.bdb handle 0 a dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 b dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 c dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792 c READ 1 HELD uid.bdb handle 0 d dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 e dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 f dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 10 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904 10 READ 1 HELD member.bdb handle 0 11 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 12 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 13 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904 13 READ 1 HELD mail.bdb handle 0 14 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 15 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 16 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 17 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 18 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904 18 READ 1 HELD sn.bdb handle 0 19 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 1a dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 1b dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904 1b READ 1 HELD cn.bdb handle 0 1c dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 1d dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 1e dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 1f dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496 80000003 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232 80000004 dd= 0 locks held 0 write locks 0 pid/thread 30034/1139579200 80000005 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 80000006 dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496 80000007 dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 80000026 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792 80000026 READ 1 HELD 0x5e948 len: 9 data: 0x560x0500000000000000 80000027 dd= 0 locks held 1 write locks 0 pid/thread 30034/1131186496 80000027 READ 1 HELD 0x36490 len: 9 data: 0x370000000000000000 80000028 dd= 0 locks held 1 write locks 0 pid/thread 30034/1131186496 80000028 WRITE 1 WAIT mail.bdb page 3 80000028 READ 1 HELD mail.bdb page 3 80000029 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792 80000029 WRITE 1 WAIT mail.bdb page 3 80000029 READ 1 HELD mail.bdb page 3 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Locks grouped by object: Locker Mode Count Status ----------------- Object --------------- 80000027 READ 1 HELD 0x36490 len: 9 data: 0x370000000000000000
1b READ 1 HELD cn.bdb handle 0 10 READ 1 HELD member.bdb handle 0 9 READ 1 HELD objectClass.bdb handle 0 c READ 1 HELD uid.bdb handle 0
80000026 READ 1 HELD 0x5e948 len: 9 data: 0x560x0500000000000000
1 READ 1 HELD id2entry.bdb handle 0 18 READ 1 HELD sn.bdb handle 0 3 READ 1 HELD dn2id.bdb handle 0
80000029 READ 1 HELD mail.bdb page 3 80000028 READ 1 HELD mail.bdb page 3 80000029 WRITE 1 WAIT mail.bdb page 3 80000028 WRITE 1 WAIT mail.bdb page 3
13 READ 1 HELD mail.bdb handle 0
db_stat -C ol (4.7.25)
Lock REGINFO information: Lock Region type 5 Region ID /apps/DECCLASA-1/data/openldap/__db.005 Region name 0x2b4a8a577000 Original region address 0x2b4a8a577000 Region address 0x2b4a8a577138 Region primary address 0 Region maximum allocation 0 Region allocated Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest REGION_JOIN_OK Region flags =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Locks grouped by lockers: Locker Mode Count Status ----------------- Object --------------- 1 dd= 0 locks held 1 write locks 0 pid/thread 32447/47463753631632 1 READ 1 HELD id2entry.bdb handle 0 2 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632 3 dd= 0 locks held 1 write locks 0 pid/thread 32447/47463753631632 3 READ 1 HELD dn2id.bdb handle 0 4 dd= 0 locks held 0 write locks 0 pid/thread 32447/1114126656 5 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632 6 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 7 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248 7 READ 1 HELD objectClass.bdb handle 0 8 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 9 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 a dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 b dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 c dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248 c READ 1 HELD cn.bdb handle 0 d dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 e dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 f dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248 f READ 1 HELD uid.bdb handle 0 10 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 11 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 12 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248 12 READ 1 HELD member.bdb handle 0 13 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 14 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 15 dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952 15 READ 1 HELD mail.bdb handle 0 16 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 17 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 18 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 19 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 1a dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952 1a READ 1 HELD sn.bdb handle 0 1b dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 1c dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 1d dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 1e dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 1f dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 80000003 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632 80000004 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 80000008 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 80000038 dd= 0 locks held 0 write locks 0 pid/thread 32447/1114126656 800000a9 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248 800000a9 READ 1 HELD 0x2d228 len: 9 data: 0x370000000000000000 800000aa dd= 0 locks held 2 write locks 0 pid/thread 32447/1097341248 800000aa WRITE 1 WAIT mail.bdb page 3 800000aa READ 1 HELD mail.bdb page 1 800000aa READ 1 HELD mail.bdb page 3 800000ab dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952 800000ab READ 1 HELD 0x43d58 len: 9 data: 0x540x0500000000000000 800000ac dd= 0 locks held 2 write locks 0 pid/thread 32447/1105733952 800000ac WRITE 1 WAIT mail.bdb page 3 800000ac READ 1 HELD mail.bdb page 1 800000ac READ 1 HELD mail.bdb page 3 800000b1 dd= 0 locks held 0 write locks 0 pid/thread 32447/1122519360 800000b2 dd= 0 locks held 0 write locks 0 pid/thread 32447/1130912064 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Locks grouped by object: Locker Mode Count Status ----------------- Object --------------- 800000a9 READ 1 HELD 0x2d228 len: 9 data: 0x370000000000000000
1 READ 1 HELD id2entry.bdb handle 0 3 READ 1 HELD dn2id.bdb handle 0 7 READ 1 HELD objectClass.bdb handle 0 f READ 1 HELD uid.bdb handle 0 1a READ 1 HELD sn.bdb handle 0
800000aa READ 1 HELD mail.bdb page 3 800000ac READ 1 HELD mail.bdb page 3 800000aa WRITE 1 WAIT mail.bdb page 3 800000ac WRITE 1 WAIT mail.bdb page 3
c READ 1 HELD cn.bdb handle 0 15 READ 1 HELD mail.bdb handle 0
800000aa READ 1 HELD mail.bdb page 1 800000ac READ 1 HELD mail.bdb page 1
12 READ 1 HELD member.bdb handle 0
800000ab READ 1 HELD 0x43d58 len: 9 data: 0x540x0500000000000000
We have also collected the backtrace of threads for both DB versions which I have uploaded to: https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.8.30_ 20130625.txt
https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.7.25_ 20130625.txt
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/