Hi Dusan,
On Fri, 28 Jun 2013, dusan.fric(a)t-systems.sk wrote:
Hi Chu,
thanks for your quick response. We will try to contact Oracle to advice here.
However, what is your suggestion here ' try different settings for the deadlock
detector' exactly?
BTW - certainly, we are patched DB 4.7.25 (all 4 patches)...
In case you can update to 2.4.35 you might want to test the mdb backend with your
workload.
Especially if you can easily reproduce the problem.
Greetings
Christian
Regards,
Dusan
-----Original Message-----
From: Howard Chu [mailto:hyc@symas.com]
Sent: 26. j??na 2013 13:32
To: Fric, Dusan; openldap-its(a)openldap.org
Subject: Re: (ITS#7633) Slapd hangs on hdb write lock
dusan.fric(a)t-systems.sk wrote:
> Full_Name: Dusan Fric
> Version: 2.4.32
> OS: RHEL 5.7 x64
> URL:
ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (88.212.40.139)
>
>
> We are experiencing frequent hangs in slapd. Once hung we usualy
> cannot continue to connect until we kill -9 the slapd process and
> restart it. The directory is used for 2 applications as user eDir and
> we are using it in production over 1 month - we have noted the busier
> the directory becomes the more often it hangs (now twice per week).
>
> We have installed identical configurations on 3 environments, each has
> only one single server (no replication). There are 30k entries in the
> directory (production).
>
> We are running:
>
> RHEL 5.7 x64 (VMWare with NFS mountpoints) OpenLDAP 2.4.32 Berkeley DB
> 4.8.30 (4.7.25)
>
> We were starting with DB 4.8.30, after downgrade to version 4.7.25
> (according to
> ITS#7378 -
>
https://www.openldap.org/its/index.cgi/Incoming?id=7378;selectid=7378)
> we are facing the same issue.
>
> The problem occurs when two requests simultaneously try to update
> attributes of a record and in most cases of the same DN value.
> We can easily reproduce it with a java test program running 2 threads
> each connecting to the ldap server and updating the record for a
> particular DN value.
> It need not be the same DN value but DN values which reside on the
> same BDB page.
Thanks for the traces, but the information you've provided shows that this is a BDB
problem, not an OpenLDAP bug. You'll have to contact Oracle for help.
The BDB deadlock detector is supposed to handle these situations.
For example in your 4.8.30 lock status there are only 2 conflicting transactions 80000028
and 80000029 and they are only contending on a single page. This is one of the most
elementary deadlock situations.
You might try different settings for the deadlock detector, but ultimately this appears
to be a BDB bug and the choice of detector shouldn't matter.
> Configuration
>
> olc cn=config (part):
> dn: cn=config
> objectClass: olcGlobal
> cn: config
> olcConcurrency: 0
> olcConnMaxPending: 100
> olcConnMaxPendingAuth: 1000
> olcSockbufMaxIncoming: 262143
> olcSockbufMaxIncomingAuth: 16777215
> olcThreads: 32
>
> olc DB hdb config (part):
> dn: olcDatabase={1}hdb,cn=config
> objectClass: olcHdbConfig
> objectClass: olcDatabaseConfig
> olcDatabase: {1}hdb
> olcDbCacheSize: 80000
> olcDbCheckpoint: 128 5
> olcDbConfig: {0}set_cachesize 0 268435456 1
> olcDbConfig: {1}set_lg_max 10485760
> olcDbConfig: {2}set_lg_bsize 2097152
> olcDbConfig: {3}set_lg_dir /pkg/openldap/dblog
> olcDbConfig: {4}set_lg_regionmax 262144
> olcDbConfig: {5}set_lk_detect DB_LOCK_EXPIRE
> olcDbConfig: {6}set_flags DB_TXN_NOSYNC
> olcDbDirtyRead: FALSE
> olcDbDNcacheSize: 0
> olcDbIDLcacheSize: 240000
> olcDbIndex: default eq,sub
> olcDbIndex: objectClass eq
> olcDbIndex: cn eq,sub
> olcDbIndex: uid eq,sub
> olcDbIndex: mail pres,eq
> olcDbIndex: sn eq,sub
> olcDbIndex: member eq
> olcDbLinearIndex: FALSE
> olcDbMode: 0600
> olcDbNoSync: TRUE
> olcDbSearchStack: 20
> olcDbShmKey: 0
> olcLastMod: TRUE
> olcMaxDerefDepth: 15
> olcMonitoring: TRUE
> olcReadOnly: FALSE
>
> We have managed to collect db_stat lock information, which indicates
> the same issue with DB write locks on both DB versions.
>
> db_stat -C ol (4.8.30)
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Lock REGINFO information:
> Lock Region type
> 5 Region ID
> /apps/DECCLASA-1/data/openldap/__db.005 Region name
> 0x2ab41cdc3000 Region address
> 0x2ab41cdc3138 Region primary address
> 0 Region maximum allocation
> 0 Region allocated
> Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest
> REGION_JOIN_OK Region flags
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Locks grouped by lockers:
> Locker Mode Count Status ----------------- Object ---------------
> 1 dd= 0 locks held 1 write locks 0 pid/thread 30034/47555767609232
> 1 READ 1 HELD id2entry.bdb handle 0
> 2 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232
> 3 dd= 0 locks held 1 write locks 0 pid/thread 30034/47555767609232
> 3 READ 1 HELD dn2id.bdb handle 0
> 4 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
> 5 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232
> 6 dd= 0 locks held 0 write locks 0 pid/thread 30034/1139579200
> 7 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
> 8 dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496
> 9 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792
> 9 READ 1 HELD objectClass.bdb handle 0
> a dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
> b dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
> c dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792
> c READ 1 HELD uid.bdb handle 0
> d dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
> e dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
> f dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
> 10 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904
> 10 READ 1 HELD member.bdb handle 0
> 11 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
> 12 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
> 13 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904
> 13 READ 1 HELD mail.bdb handle 0
> 14 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
> 15 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
> 16 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
> 17 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
> 18 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904
> 18 READ 1 HELD sn.bdb handle 0
> 19 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
> 1a dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
> 1b dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904
> 1b READ 1 HELD cn.bdb handle 0
> 1c dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
> 1d dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
> 1e dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
> 1f dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496
> 80000003 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232
> 80000004 dd= 0 locks held 0 write locks 0 pid/thread 30034/1139579200
> 80000005 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
> 80000006 dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496
> 80000007 dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
> 80000026 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792
> 80000026 READ 1 HELD 0x5e948 len: 9 data: 0x560x0500000000000000
> 80000027 dd= 0 locks held 1 write locks 0 pid/thread 30034/1131186496
> 80000027 READ 1 HELD 0x36490 len: 9 data: 0x370000000000000000
> 80000028 dd= 0 locks held 1 write locks 0 pid/thread 30034/1131186496
> 80000028 WRITE 1 WAIT mail.bdb page 3
> 80000028 READ 1 HELD mail.bdb page 3
> 80000029 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792
> 80000029 WRITE 1 WAIT mail.bdb page 3
> 80000029 READ 1 HELD mail.bdb page 3
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Locks grouped by object:
> Locker Mode Count Status ----------------- Object ---------------
> 80000027 READ 1 HELD 0x36490 len: 9 data: 0x370000000000000000
>
> 1b READ 1 HELD cn.bdb handle 0
>
> 10 READ 1 HELD member.bdb handle 0
>
> 9 READ 1 HELD objectClass.bdb handle 0
>
> c READ 1 HELD uid.bdb handle 0
>
> 80000026 READ 1 HELD 0x5e948 len: 9 data: 0x560x0500000000000000
>
> 1 READ 1 HELD id2entry.bdb handle 0
>
> 18 READ 1 HELD sn.bdb handle 0
>
> 3 READ 1 HELD dn2id.bdb handle 0
>
> 80000029 READ 1 HELD mail.bdb page 3
> 80000028 READ 1 HELD mail.bdb page 3
> 80000029 WRITE 1 WAIT mail.bdb page 3
> 80000028 WRITE 1 WAIT mail.bdb page 3
>
> 13 READ 1 HELD mail.bdb handle 0
>
> db_stat -C ol (4.7.25)
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Lock REGINFO information:
> Lock Region type
> 5 Region ID
> /apps/DECCLASA-1/data/openldap/__db.005 Region name
> 0x2b4a8a577000 Original region address
> 0x2b4a8a577000 Region address
> 0x2b4a8a577138 Region primary address
> 0 Region maximum allocation
> 0 Region allocated
> Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest
> REGION_JOIN_OK Region flags
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Locks grouped by lockers:
> Locker Mode Count Status ----------------- Object ---------------
> 1 dd= 0 locks held 1 write locks 0 pid/thread 32447/47463753631632
> 1 READ 1 HELD id2entry.bdb handle 0
> 2 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632
> 3 dd= 0 locks held 1 write locks 0 pid/thread 32447/47463753631632
> 3 READ 1 HELD dn2id.bdb handle 0
> 4 dd= 0 locks held 0 write locks 0 pid/thread 32447/1114126656
> 5 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632
> 6 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
> 7 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
> 7 READ 1 HELD objectClass.bdb handle 0
> 8 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
> 9 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
> a dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
> b dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
> c dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
> c READ 1 HELD cn.bdb handle 0
> d dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
> e dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
> f dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
> f READ 1 HELD uid.bdb handle 0
> 10 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
> 11 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
> 12 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
> 12 READ 1 HELD member.bdb handle 0
> 13 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
> 14 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
> 15 dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952
> 15 READ 1 HELD mail.bdb handle 0
> 16 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
> 17 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
> 18 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
> 19 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
> 1a dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952
> 1a READ 1 HELD sn.bdb handle 0
> 1b dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
> 1c dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
> 1d dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
> 1e dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
> 1f dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
> 80000003 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632
> 80000004 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
> 80000008 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
> 80000038 dd= 0 locks held 0 write locks 0 pid/thread 32447/1114126656
> 800000a9 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
> 800000a9 READ 1 HELD 0x2d228 len: 9 data: 0x370000000000000000
> 800000aa dd= 0 locks held 2 write locks 0 pid/thread 32447/1097341248
> 800000aa WRITE 1 WAIT mail.bdb page 3
> 800000aa READ 1 HELD mail.bdb page 1
> 800000aa READ 1 HELD mail.bdb page 3
> 800000ab dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952
> 800000ab READ 1 HELD 0x43d58 len: 9 data: 0x540x0500000000000000
> 800000ac dd= 0 locks held 2 write locks 0 pid/thread 32447/1105733952
> 800000ac WRITE 1 WAIT mail.bdb page 3
> 800000ac READ 1 HELD mail.bdb page 1
> 800000ac READ 1 HELD mail.bdb page 3
> 800000b1 dd= 0 locks held 0 write locks 0 pid/thread 32447/1122519360
> 800000b2 dd= 0 locks held 0 write locks 0 pid/thread 32447/1130912064
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Locks grouped by object:
> Locker Mode Count Status ----------------- Object ---------------
> 800000a9 READ 1 HELD 0x2d228 len: 9 data: 0x370000000000000000
>
> 1 READ 1 HELD id2entry.bdb handle 0
>
> 3 READ 1 HELD dn2id.bdb handle 0
>
> 7 READ 1 HELD objectClass.bdb handle 0
>
> f READ 1 HELD uid.bdb handle 0
>
> 1a READ 1 HELD sn.bdb handle 0
>
> 800000aa READ 1 HELD mail.bdb page 3
> 800000ac READ 1 HELD mail.bdb page 3
> 800000aa WRITE 1 WAIT mail.bdb page 3
> 800000ac WRITE 1 WAIT mail.bdb page 3
>
> c READ 1 HELD cn.bdb handle 0
>
> 15 READ 1 HELD mail.bdb handle 0
>
> 800000aa READ 1 HELD mail.bdb page 1
> 800000ac READ 1 HELD mail.bdb page 1
>
> 12 READ 1 HELD member.bdb handle 0
>
> 800000ab READ 1 HELD 0x43d58 len: 9 data: 0x540x0500000000000000
>
>
> We have also collected the backtrace of threads for both DB versions
> which I have uploaded to:
>
https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.8.30_
> 20130625.txt
>
>
https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.7.25_
> 20130625.txt
>
>
--
-- Howard Chu
CTO, Symas Corp.
http://www.symas.com
Director, Highland Sun
http://highlandsun.com/hyc/
Chief Architect, OpenLDAP
http://www.openldap.org/project/
--
Christian Kratzer CK Software GmbH
Email: ck(a)cksoft.de Wildberger Weg 24/2
Phone: +49 7032 893 997 - 0 D-71126 Gaeufelden
Fax: +49 7032 893 997 - 9 HRB 245288, Amtsgericht Stuttgart
Web:
http://www.cksoft.de/ Geschaeftsfuehrer: Christian Kratzer