Hi Chu,
thanks for your quick response. We will try to contact Oracle to advice here.
However, what is your suggestion here ' try different settings for the deadlock
detector' exactly?
BTW - certainly, we are patched DB 4.7.25 (all 4 patches)...
Regards,
Dusan
-----Original Message-----
From: Howard Chu [mailto:hyc@symas.com]
Sent: 26. júna 2013 13:32
To: Fric, Dusan; openldap-its(a)openldap.org
Subject: Re: (ITS#7633) Slapd hangs on hdb write lock
dusan.fric(a)t-systems.sk wrote:
Full_Name: Dusan Fric
Version: 2.4.32
OS: RHEL 5.7 x64
URL:
ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (88.212.40.139)
We are experiencing frequent hangs in slapd. Once hung we usualy
cannot continue to connect until we kill -9 the slapd process and
restart it. The directory is used for 2 applications as user eDir and
we are using it in production over 1 month - we have noted the busier
the directory becomes the more often it hangs (now twice per week).
We have installed identical configurations on 3 environments, each has
only one single server (no replication). There are 30k entries in the
directory (production).
We are running:
RHEL 5.7 x64 (VMWare with NFS mountpoints) OpenLDAP 2.4.32 Berkeley DB
4.8.30 (4.7.25)
We were starting with DB 4.8.30, after downgrade to version 4.7.25
(according to
ITS#7378 -
https://www.openldap.org/its/index.cgi/Incoming?id=7378;selectid=7378)
we are facing the same issue.
The problem occurs when two requests simultaneously try to update
attributes of a record and in most cases of the same DN value.
We can easily reproduce it with a java test program running 2 threads
each connecting to the ldap server and updating the record for a
particular DN value.
It need not be the same DN value but DN values which reside on the
same BDB page.
Thanks for the traces, but the information you've provided shows that this is a BDB
problem, not an OpenLDAP bug. You'll have to contact Oracle for help.
The BDB deadlock detector is supposed to handle these situations.
For example in your 4.8.30 lock status there are only 2 conflicting transactions 80000028
and 80000029 and they are only contending on a single page. This is one of the most
elementary deadlock situations.
You might try different settings for the deadlock detector, but ultimately this appears to
be a BDB bug and the choice of detector shouldn't matter.
Configuration
olc cn=config (part):
dn: cn=config
objectClass: olcGlobal
cn: config
olcConcurrency: 0
olcConnMaxPending: 100
olcConnMaxPendingAuth: 1000
olcSockbufMaxIncoming: 262143
olcSockbufMaxIncomingAuth: 16777215
olcThreads: 32
olc DB hdb config (part):
dn: olcDatabase={1}hdb,cn=config
objectClass: olcHdbConfig
objectClass: olcDatabaseConfig
olcDatabase: {1}hdb
olcDbCacheSize: 80000
olcDbCheckpoint: 128 5
olcDbConfig: {0}set_cachesize 0 268435456 1
olcDbConfig: {1}set_lg_max 10485760
olcDbConfig: {2}set_lg_bsize 2097152
olcDbConfig: {3}set_lg_dir /pkg/openldap/dblog
olcDbConfig: {4}set_lg_regionmax 262144
olcDbConfig: {5}set_lk_detect DB_LOCK_EXPIRE
olcDbConfig: {6}set_flags DB_TXN_NOSYNC
olcDbDirtyRead: FALSE
olcDbDNcacheSize: 0
olcDbIDLcacheSize: 240000
olcDbIndex: default eq,sub
olcDbIndex: objectClass eq
olcDbIndex: cn eq,sub
olcDbIndex: uid eq,sub
olcDbIndex: mail pres,eq
olcDbIndex: sn eq,sub
olcDbIndex: member eq
olcDbLinearIndex: FALSE
olcDbMode: 0600
olcDbNoSync: TRUE
olcDbSearchStack: 20
olcDbShmKey: 0
olcLastMod: TRUE
olcMaxDerefDepth: 15
olcMonitoring: TRUE
olcReadOnly: FALSE
We have managed to collect db_stat lock information, which indicates
the same issue with DB write locks on both DB versions.
db_stat -C ol (4.8.30)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Lock REGINFO information:
Lock Region type
5 Region ID
/apps/DECCLASA-1/data/openldap/__db.005 Region name
0x2ab41cdc3000 Region address
0x2ab41cdc3138 Region primary address
0 Region maximum allocation
0 Region allocated
Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest
REGION_JOIN_OK Region flags
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by lockers:
Locker Mode Count Status ----------------- Object ---------------
1 dd= 0 locks held 1 write locks 0 pid/thread 30034/47555767609232
1 READ 1 HELD id2entry.bdb handle 0
2 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232
3 dd= 0 locks held 1 write locks 0 pid/thread 30034/47555767609232
3 READ 1 HELD dn2id.bdb handle 0
4 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
5 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232
6 dd= 0 locks held 0 write locks 0 pid/thread 30034/1139579200
7 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
8 dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496
9 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792
9 READ 1 HELD objectClass.bdb handle 0
a dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
b dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
c dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792
c READ 1 HELD uid.bdb handle 0
d dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
e dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
f dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
10 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904
10 READ 1 HELD member.bdb handle 0
11 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
12 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
13 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904
13 READ 1 HELD mail.bdb handle 0
14 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
15 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
16 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
17 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
18 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904
18 READ 1 HELD sn.bdb handle 0
19 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
1a dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
1b dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904
1b READ 1 HELD cn.bdb handle 0
1c dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
1d dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
1e dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
1f dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496
80000003 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232
80000004 dd= 0 locks held 0 write locks 0 pid/thread 30034/1139579200
80000005 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
80000006 dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496
80000007 dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
80000026 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792
80000026 READ 1 HELD 0x5e948 len: 9 data: 0x560x0500000000000000
80000027 dd= 0 locks held 1 write locks 0 pid/thread 30034/1131186496
80000027 READ 1 HELD 0x36490 len: 9 data: 0x370000000000000000
80000028 dd= 0 locks held 1 write locks 0 pid/thread 30034/1131186496
80000028 WRITE 1 WAIT mail.bdb page 3
80000028 READ 1 HELD mail.bdb page 3
80000029 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792
80000029 WRITE 1 WAIT mail.bdb page 3
80000029 READ 1 HELD mail.bdb page 3
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by object:
Locker Mode Count Status ----------------- Object ---------------
80000027 READ 1 HELD 0x36490 len: 9 data: 0x370000000000000000
1b READ 1 HELD cn.bdb handle 0
10 READ 1 HELD member.bdb handle 0
9 READ 1 HELD objectClass.bdb handle 0
c READ 1 HELD uid.bdb handle 0
80000026 READ 1 HELD 0x5e948 len: 9 data: 0x560x0500000000000000
1 READ 1 HELD id2entry.bdb handle 0
18 READ 1 HELD sn.bdb handle 0
3 READ 1 HELD dn2id.bdb handle 0
80000029 READ 1 HELD mail.bdb page 3
80000028 READ 1 HELD mail.bdb page 3
80000029 WRITE 1 WAIT mail.bdb page 3
80000028 WRITE 1 WAIT mail.bdb page 3
13 READ 1 HELD mail.bdb handle 0
db_stat -C ol (4.7.25)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Lock REGINFO information:
Lock Region type
5 Region ID
/apps/DECCLASA-1/data/openldap/__db.005 Region name
0x2b4a8a577000 Original region address
0x2b4a8a577000 Region address
0x2b4a8a577138 Region primary address
0 Region maximum allocation
0 Region allocated
Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest
REGION_JOIN_OK Region flags
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by lockers:
Locker Mode Count Status ----------------- Object ---------------
1 dd= 0 locks held 1 write locks 0 pid/thread 32447/47463753631632
1 READ 1 HELD id2entry.bdb handle 0
2 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632
3 dd= 0 locks held 1 write locks 0 pid/thread 32447/47463753631632
3 READ 1 HELD dn2id.bdb handle 0
4 dd= 0 locks held 0 write locks 0 pid/thread 32447/1114126656
5 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632
6 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
7 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
7 READ 1 HELD objectClass.bdb handle 0
8 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
9 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
a dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
b dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
c dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
c READ 1 HELD cn.bdb handle 0
d dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
e dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
f dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
f READ 1 HELD uid.bdb handle 0
10 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
11 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
12 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
12 READ 1 HELD member.bdb handle 0
13 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
14 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
15 dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952
15 READ 1 HELD mail.bdb handle 0
16 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
17 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
18 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
19 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
1a dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952
1a READ 1 HELD sn.bdb handle 0
1b dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
1c dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
1d dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
1e dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
1f dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
80000003 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632
80000004 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
80000008 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
80000038 dd= 0 locks held 0 write locks 0 pid/thread 32447/1114126656
800000a9 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
800000a9 READ 1 HELD 0x2d228 len: 9 data: 0x370000000000000000
800000aa dd= 0 locks held 2 write locks 0 pid/thread 32447/1097341248
800000aa WRITE 1 WAIT mail.bdb page 3
800000aa READ 1 HELD mail.bdb page 1
800000aa READ 1 HELD mail.bdb page 3
800000ab dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952
800000ab READ 1 HELD 0x43d58 len: 9 data: 0x540x0500000000000000
800000ac dd= 0 locks held 2 write locks 0 pid/thread 32447/1105733952
800000ac WRITE 1 WAIT mail.bdb page 3
800000ac READ 1 HELD mail.bdb page 1
800000ac READ 1 HELD mail.bdb page 3
800000b1 dd= 0 locks held 0 write locks 0 pid/thread 32447/1122519360
800000b2 dd= 0 locks held 0 write locks 0 pid/thread 32447/1130912064
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by object:
Locker Mode Count Status ----------------- Object ---------------
800000a9 READ 1 HELD 0x2d228 len: 9 data: 0x370000000000000000
1 READ 1 HELD id2entry.bdb handle 0
3 READ 1 HELD dn2id.bdb handle 0
7 READ 1 HELD objectClass.bdb handle 0
f READ 1 HELD uid.bdb handle 0
1a READ 1 HELD sn.bdb handle 0
800000aa READ 1 HELD mail.bdb page 3
800000ac READ 1 HELD mail.bdb page 3
800000aa WRITE 1 WAIT mail.bdb page 3
800000ac WRITE 1 WAIT mail.bdb page 3
c READ 1 HELD cn.bdb handle 0
15 READ 1 HELD mail.bdb handle 0
800000aa READ 1 HELD mail.bdb page 1
800000ac READ 1 HELD mail.bdb page 1
12 READ 1 HELD member.bdb handle 0
800000ab READ 1 HELD 0x43d58 len: 9 data: 0x540x0500000000000000
We have also collected the backtrace of threads for both DB versions
which I have uploaded to:
https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.8.30_
20130625.txt
https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.7.25_
20130625.txt
--
-- Howard Chu
CTO, Symas Corp.
http://www.symas.com
Director, Highland Sun
http://highlandsun.com/hyc/
Chief Architect, OpenLDAP
http://www.openldap.org/project/