Full_Name: Dusan Fric Version: 2.4.32 OS: RHEL 5.7 x64 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (88.212.40.139)
We are experiencing frequent hangs in slapd. Once hung we usualy cannot continue to connect until we kill -9 the slapd process and restart it. The directory is used for 2 applications as user eDir and we are using it in production over 1 month - we have noted the busier the directory becomes the more often it hangs (now twice per week).
We have installed identical configurations on 3 environments, each has only one single server (no replication). There are 30k entries in the directory (production).
We are running:
RHEL 5.7 x64 (VMWare with NFS mountpoints) OpenLDAP 2.4.32 Berkeley DB 4.8.30 (4.7.25)
We were starting with DB 4.8.30, after downgrade to version 4.7.25 (according to ITS#7378 - https://www.openldap.org/its/index.cgi/Incoming?id=7378;selectid=7378) we are facing the same issue.
The problem occurs when two requests simultaneously try to update attributes of a record and in most cases of the same DN value. We can easily reproduce it with a java test program running 2 threads each connecting to the ldap server and updating the record for a particular DN value. It need not be the same DN value but DN values which reside on the same BDB page.
Configuration
olc cn=config (part): dn: cn=config objectClass: olcGlobal cn: config olcConcurrency: 0 olcConnMaxPending: 100 olcConnMaxPendingAuth: 1000 olcSockbufMaxIncoming: 262143 olcSockbufMaxIncomingAuth: 16777215 olcThreads: 32
olc DB hdb config (part): dn: olcDatabase={1}hdb,cn=config objectClass: olcHdbConfig objectClass: olcDatabaseConfig olcDatabase: {1}hdb olcDbCacheSize: 80000 olcDbCheckpoint: 128 5 olcDbConfig: {0}set_cachesize 0 268435456 1 olcDbConfig: {1}set_lg_max 10485760 olcDbConfig: {2}set_lg_bsize 2097152 olcDbConfig: {3}set_lg_dir /pkg/openldap/dblog olcDbConfig: {4}set_lg_regionmax 262144 olcDbConfig: {5}set_lk_detect DB_LOCK_EXPIRE olcDbConfig: {6}set_flags DB_TXN_NOSYNC olcDbDirtyRead: FALSE olcDbDNcacheSize: 0 olcDbIDLcacheSize: 240000 olcDbIndex: default eq,sub olcDbIndex: objectClass eq olcDbIndex: cn eq,sub olcDbIndex: uid eq,sub olcDbIndex: mail pres,eq olcDbIndex: sn eq,sub olcDbIndex: member eq olcDbLinearIndex: FALSE olcDbMode: 0600 olcDbNoSync: TRUE olcDbSearchStack: 20 olcDbShmKey: 0 olcLastMod: TRUE olcMaxDerefDepth: 15 olcMonitoring: TRUE olcReadOnly: FALSE
We have managed to collect db_stat lock information, which indicates the same issue with DB write locks on both DB versions.
db_stat -C ol (4.8.30) =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Lock REGINFO information: Lock Region type 5 Region ID /apps/DECCLASA-1/data/openldap/__db.005 Region name 0x2ab41cdc3000 Region address 0x2ab41cdc3138 Region primary address 0 Region maximum allocation 0 Region allocated Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest REGION_JOIN_OK Region flags =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Locks grouped by lockers: Locker Mode Count Status ----------------- Object --------------- 1 dd= 0 locks held 1 write locks 0 pid/thread 30034/47555767609232 1 READ 1 HELD id2entry.bdb handle 0 2 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232 3 dd= 0 locks held 1 write locks 0 pid/thread 30034/47555767609232 3 READ 1 HELD dn2id.bdb handle 0 4 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 5 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232 6 dd= 0 locks held 0 write locks 0 pid/thread 30034/1139579200 7 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 8 dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496 9 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792 9 READ 1 HELD objectClass.bdb handle 0 a dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 b dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 c dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792 c READ 1 HELD uid.bdb handle 0 d dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 e dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 f dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 10 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904 10 READ 1 HELD member.bdb handle 0 11 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 12 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 13 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904 13 READ 1 HELD mail.bdb handle 0 14 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 15 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 16 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 17 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 18 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904 18 READ 1 HELD sn.bdb handle 0 19 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 1a dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 1b dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904 1b READ 1 HELD cn.bdb handle 0 1c dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 1d dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 1e dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 1f dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496 80000003 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232 80000004 dd= 0 locks held 0 write locks 0 pid/thread 30034/1139579200 80000005 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 80000006 dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496 80000007 dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 80000026 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792 80000026 READ 1 HELD 0x5e948 len: 9 data: 0x560x0500000000000000 80000027 dd= 0 locks held 1 write locks 0 pid/thread 30034/1131186496 80000027 READ 1 HELD 0x36490 len: 9 data: 0x370000000000000000 80000028 dd= 0 locks held 1 write locks 0 pid/thread 30034/1131186496 80000028 WRITE 1 WAIT mail.bdb page 3 80000028 READ 1 HELD mail.bdb page 3 80000029 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792 80000029 WRITE 1 WAIT mail.bdb page 3 80000029 READ 1 HELD mail.bdb page 3 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Locks grouped by object: Locker Mode Count Status ----------------- Object --------------- 80000027 READ 1 HELD 0x36490 len: 9 data: 0x370000000000000000
1b READ 1 HELD cn.bdb handle 0
10 READ 1 HELD member.bdb handle 0
9 READ 1 HELD objectClass.bdb handle 0
c READ 1 HELD uid.bdb handle 0
80000026 READ 1 HELD 0x5e948 len: 9 data: 0x560x0500000000000000
1 READ 1 HELD id2entry.bdb handle 0
18 READ 1 HELD sn.bdb handle 0
3 READ 1 HELD dn2id.bdb handle 0
80000029 READ 1 HELD mail.bdb page 3 80000028 READ 1 HELD mail.bdb page 3 80000029 WRITE 1 WAIT mail.bdb page 3 80000028 WRITE 1 WAIT mail.bdb page 3
13 READ 1 HELD mail.bdb handle 0
db_stat -C ol (4.7.25) =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Lock REGINFO information: Lock Region type 5 Region ID /apps/DECCLASA-1/data/openldap/__db.005 Region name 0x2b4a8a577000 Original region address 0x2b4a8a577000 Region address 0x2b4a8a577138 Region primary address 0 Region maximum allocation 0 Region allocated Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest REGION_JOIN_OK Region flags =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Locks grouped by lockers: Locker Mode Count Status ----------------- Object --------------- 1 dd= 0 locks held 1 write locks 0 pid/thread 32447/47463753631632 1 READ 1 HELD id2entry.bdb handle 0 2 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632 3 dd= 0 locks held 1 write locks 0 pid/thread 32447/47463753631632 3 READ 1 HELD dn2id.bdb handle 0 4 dd= 0 locks held 0 write locks 0 pid/thread 32447/1114126656 5 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632 6 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 7 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248 7 READ 1 HELD objectClass.bdb handle 0 8 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 9 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 a dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 b dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 c dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248 c READ 1 HELD cn.bdb handle 0 d dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 e dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 f dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248 f READ 1 HELD uid.bdb handle 0 10 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 11 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 12 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248 12 READ 1 HELD member.bdb handle 0 13 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 14 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 15 dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952 15 READ 1 HELD mail.bdb handle 0 16 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 17 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 18 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 19 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 1a dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952 1a READ 1 HELD sn.bdb handle 0 1b dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 1c dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 1d dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 1e dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 1f dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 80000003 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632 80000004 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 80000008 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 80000038 dd= 0 locks held 0 write locks 0 pid/thread 32447/1114126656 800000a9 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248 800000a9 READ 1 HELD 0x2d228 len: 9 data: 0x370000000000000000 800000aa dd= 0 locks held 2 write locks 0 pid/thread 32447/1097341248 800000aa WRITE 1 WAIT mail.bdb page 3 800000aa READ 1 HELD mail.bdb page 1 800000aa READ 1 HELD mail.bdb page 3 800000ab dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952 800000ab READ 1 HELD 0x43d58 len: 9 data: 0x540x0500000000000000 800000ac dd= 0 locks held 2 write locks 0 pid/thread 32447/1105733952 800000ac WRITE 1 WAIT mail.bdb page 3 800000ac READ 1 HELD mail.bdb page 1 800000ac READ 1 HELD mail.bdb page 3 800000b1 dd= 0 locks held 0 write locks 0 pid/thread 32447/1122519360 800000b2 dd= 0 locks held 0 write locks 0 pid/thread 32447/1130912064 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Locks grouped by object: Locker Mode Count Status ----------------- Object --------------- 800000a9 READ 1 HELD 0x2d228 len: 9 data: 0x370000000000000000
1 READ 1 HELD id2entry.bdb handle 0
3 READ 1 HELD dn2id.bdb handle 0
7 READ 1 HELD objectClass.bdb handle 0
f READ 1 HELD uid.bdb handle 0
1a READ 1 HELD sn.bdb handle 0
800000aa READ 1 HELD mail.bdb page 3 800000ac READ 1 HELD mail.bdb page 3 800000aa WRITE 1 WAIT mail.bdb page 3 800000ac WRITE 1 WAIT mail.bdb page 3
c READ 1 HELD cn.bdb handle 0
15 READ 1 HELD mail.bdb handle 0
800000aa READ 1 HELD mail.bdb page 1 800000ac READ 1 HELD mail.bdb page 1
12 READ 1 HELD member.bdb handle 0
800000ab READ 1 HELD 0x43d58 len: 9 data: 0x540x0500000000000000
We have also collected the backtrace of threads for both DB versions which I have uploaded to: https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.8.30_201306...
https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.7.25_201306...