openldap-bugs September 2012

openldap-bugs@openldap.org

29 participants
107 discussions

Re: (ITS#7364) mdb: clean up POSIX semaphores on environment close.
by h.b.furuseth＠usit.uio.no 03 Sep '12

03 Sep '12

Howard Chu writes: > h.b.furuseth(a)usit.uio.no wrote: >> Reopening this. >> >> >> This is worse with a database with is intended to be used by >> different users (A and B): > > This is pretty much never a use case we would worry about. In most > applications, a single userID creates and operates on the DB. I'm fine with just documenting that as a restriction on some systems. If not: >> (...) >> The work-around I can think of is a "multi-uid" mode which instead >> resets the semaphore with sem_post() if sem_getvalue() returns 0. >> I don't know how ugly that is considered to be. Could ask >> comp.programming.unix, or check what Berkeley DB does. > > That would be OK in general. It still leaves the question of how to remove the > semaphore if the DB is being destroyed. But it's probably not worth the > trouble to worry about this so much. Those OSes should just get their act > together and support the POSIX process-shared mutexes. > >> This mode >> should use mode 0666 for the semaphores (temporarily setting umask >> 0, yuck), > > Definitely not. The caller specifies a mode; if they want 0666 they > should configure it as such. 0666 would likely be wrong for the database file(s). But this flag could just as well consist of specifying a mode parameter for the semaphores. It's a threaded library doing umask() I dislike. And, as you say, the need to remove the semaphore afterwards. >> or it should not sem_unlink() since next user may create >> the semaphores with a group which gives the wrong users access. > > Same as today where running slapadd with the wrong uid causes trouble > for the following slapd. The answer is obvious: use the right uid when > accessing the DB. We're talking about the case where there is no single "right" uid. Not relevant for slapd, only libmdb by itself. >> Other matters with the current implementation - I'll patch these: >> >> mdb_env_excl_lock() need not retry getting a non-exclusive lock when >> closing. mdb_env_close() can pass *excl = -1 to tell it not to. >> >> mdb_env_setup_locks() can sem_unlink both semaphores before doing >> anything else, so that reopening a database as root will clean up. >> Drop the error checks of sem_unlink (so both get called), instead >> use O_EXCL in sem_open(,O_CREAT,,). Unless I'm missing something, >> the error checks just work like an emulation of O_EXCL anyway. > > The sem_unlink() and sem_open() sequence isn't ideal, certainly. I would > prefer to just use the existing semaphore. ...followed by 'if (sem_getvalue() shows 0) sem_post()' as above, then. -- Hallvard

1 0

Re: (ITS#7378) Slapd hangs on bdb write lock
by hyc＠symas.com 03 Sep '12

03 Sep '12

michael(a)stroeder.com wrote: > This is a cryptographically signed message in MIME format. > > --------------ms080100030105010600070605 > Content-Type: text/plain; charset=ISO-8859-1 > Content-Transfer-Encoding: quoted-printable > > A couple of days ago I had a hang with OpenLDAP 2.4.32 / back-hdb running= > on > Debian Squeeze, self-compiled against BDB 4.8.30. It seemed Database was > locked as restarting slapd of even rebooting OS did not help. Unfortunate= > ly I > had to bring up the system as fast as possible and could not examine the = > problem. db_recover will always return the DB to a usable state and reset any DB locks. (It completely deletes the lock region, so there cannot be any stale locks after it runs.) > The system has only 200 entries and not much load yet. I had renamed entr= > ies > with web2ldap when all 4 masters (4-way MMR) locked up one after the othe= > r. > So there seem to be lockup problems in 2.4.32. The only way to know if you're seeing the same problem as the original poster is if you provide db_stat -CA and gdb trace output, like the original poster did. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#7364) mdb: clean up POSIX semaphores on environment close.
by hyc＠symas.com 03 Sep '12

03 Sep '12

Maucci, Cyrille wrote: > -----Original Message----- > From: openldap-bugs-bounces(a)OpenLDAP.org [mailto:openldap-bugs-bounces@OpenLDAP.org] On Behalf Of hyc(a)symas.com >>> Another possibility is to just use SysV semaphores instead of POSIX semaphores. >>> Then you can use the ipcs(1) command to manually cleanup. BerkeleyDB uses >>> SysV shared memory when you specify a shared memory environment and it >>> appears that SysV semaphore support is actually more widespread than POSIX semaphores. > > Just to mention that at least on HPUX, Posix semaphores are more efficient than SysV ones. I'm also reminded that there is no defined behavior for SysV semaphores in threads, they are only speciried for inter-process synchronization. So forget that... -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#7378) Slapd hangs on bdb write lock
by hyc＠symas.com 03 Sep '12

03 Sep '12

nikolai(a)net24.co.nz wrote: > Full_Name: Nikolai Schupbach > Version: 2.4.31 > OS: FreeBSD > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (202.78.158.60) > > > We are experiencing frequent hangs in slapd. Once hung we can continue to > connect, but all searches will just hang indefinitely until we kill -9 the slapd > process and restart it. The directory is used for mail routing and we have been > migrating to it from an existing directory server over the last 3 weeks - we > have noted the busier the directory becomes the more often it hangs (now once > every 2 days). > > We have one master and 10 syncrepl read only replicas - the master is used > mainly for writes and has not hung yet, but most of the replicas have hung at > least once. The replicas receive anywhere between 50 to 300 searches/sec, while > the master would only get 1/sec. There are 45k entries in the directory. > > We are running: > > FreeBSD 8.3/9.0 x64 > OpenLDAP 2.4.31 > Berkeley DB 4.6.21 > > The old directory we are migrating from has the same load and is also running > OpenLDAP, but has been rock solid for 5 years. It is running Berkeley DB 4.3.29 > and OpenLDAP 2.3.27. > > We have managed to collect db_stat lock information, which indicates the same > issue each time - a write lock on dn2id.bdb. It's more than that. Your db_stat shows that a single thread has 3 active transactions. This should never happen: 8000a85e dd= 0 locks held 2 write locks 0 pid/thread 88000/34386526336 8000a85e READ 1 HELD 0xb19a8 len: 9 data: 40xa800000000000000 8000a85e READ 1 HELD 0xb26c8 len: 9 data: 60xa800000000000000 8000a85f dd= 0 locks held 8 write locks 4 pid/thread 88000/34386526336 8000a85f READ 1 WAIT dn2id.bdb page 559 8000a85f READ 1 HELD dn2id.bdb page 768 8000a85f WRITE 2 HELD dn2id.bdb page 1362 8000a85f READ 2 HELD dn2id.bdb page 1362 8000a85f WRITE 2 HELD dn2id.bdb page 1353 8000a85f READ 2 HELD dn2id.bdb page 1353 8000a85f WRITE 2 HELD dn2id.bdb page 933 8000a85f READ 1 HELD dn2id.bdb page 933 8000a85f WRITE 4 HELD dn2id.bdb page 219 80001047 dd=28 locks held 1 write locks 1 pid/thread 88000/34386526336 80001047 WRITE 1 HELD dn2id.bdb page 559 I would first recommend changing from BDB 4.6.21 to some other version. There are no code paths in back-bdb where we would ever return without either committing or aborting the current transactions, so this appears to be a BDB bug, not an OpenLDAP bug. > We have also collected the backtrace for all the threads which I have uploaded > to: > > ftp://ftp.openldap.org/incoming/nikolai-gdb-120902.txt > > The full db_stat output is located at: > > ftp://ftp.openldap.org/incoming/nikolai-dbstat-120902.txt -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

RE: (ITS#7364) mdb: clean up POSIX semaphores on environment close.
by cyrille.maucci＠hp.com 03 Sep '12

03 Sep '12

-----Original Message----- From: openldap-bugs-bounces(a)OpenLDAP.org [mailto:openldap-bugs-bounces@OpenLDAP.org] On Behalf Of hyc(a)symas.com >> Another possibility is to just use SysV semaphores instead of POSIX semaphores. >> Then you can use the ipcs(1) command to manually cleanup. BerkeleyDB uses >> SysV shared memory when you specify a shared memory environment and it >> appears that SysV semaphore support is actually more widespread than POSIX semaphores. Just to mention that at least on HPUX, Posix semaphores are more efficient than SysV ones. ++Cyrille

1 0

Re: (ITS#7364) mdb: clean up POSIX semaphores on environment close.
by hyc＠symas.com 03 Sep '12

03 Sep '12

h.b.furuseth(a)usit.uio.no wrote: > Reopening this. > > > This is worse with a database with is intended to be used by > different users (A and B): This is pretty much never a use case we would worry about. In most applications, a single userID creates and operates on the DB. > A opens the DB and creates the semaphores with e.g. mode 0660. > B opens it, A closes, B closes - and fails sem_unlink() which > only A and root can do. > > Next, B (or C) fails mdb_env_open() because sem_unlink() fails > again. > > The work-around I can think of is a "multi-uid" mode which instead > resets the semaphore with sem_post() if sem_getvalue() returns 0. > I don't know how ugly that is considered to be. Could ask > comp.programming.unix, or check what Berkeley DB does. That would be OK in general. It still leaves the question of how to remove the semaphore if the DB is being destroyed. But it's probably not worth the trouble to worry about this so much. Those OSes should just get their act together and support the POSIX process-shared mutexes. > This mode > should use mode 0666 for the semaphores (temporarily setting umask > 0, yuck), Definitely not. The caller specifies a mode; if they want 0666 they should configure it as such. > or it should not sem_unlink() since next user may create > the semaphores with a group which gives the wrong users access. Same as today where running slapadd with the wrong uid causes trouble for the following slapd. The answer is obvious: use the right uid when accessing the DB. > Other matters with the current implementation - I'll patch these: > > mdb_env_excl_lock() need not retry getting a non-exclusive lock when > closing. mdb_env_close() can pass *excl = -1 to tell it not to. > > mdb_env_setup_locks() can sem_unlink both semaphores before doing > anything else, so that reopening a database as root will clean up. > Drop the error checks of sem_unlink (so both get called), instead > use O_EXCL in sem_open(,O_CREAT,,). Unless I'm missing something, > the error checks just work like an emulation of O_EXCL anyway. The sem_unlink() and sem_open() sequence isn't ideal, certainly. I would prefer to just use the existing semaphore. Another possibility is to just use SysV semaphores instead of POSIX semaphores. Then you can use the ipcs(1) command to manually cleanup. BerkeleyDB uses SysV shared memory when you specify a shared memory environment and it appears that SysV semaphore support is actually more widespread than POSIX semaphores. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#7378) Slapd hangs on bdb write lock
by michael＠stroeder.com 02 Sep '12

02 Sep '12

This is a cryptographically signed message in MIME format. --------------ms080100030105010600070605 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable A couple of days ago I had a hang with OpenLDAP 2.4.32 / back-hdb running= on Debian Squeeze, self-compiled against BDB 4.8.30. It seemed Database was locked as restarting slapd of even rebooting OS did not help. Unfortunate= ly I had to bring up the system as fast as possible and could not examine the = problem. The system has only 200 entries and not much load yet. I had renamed entr= ies with web2ldap when all 4 masters (4-way MMR) locked up one after the othe= r. So there seem to be lockup problems in 2.4.32. --------------ms080100030105010600070605 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIILHzCC BT8wggQnoAMCAQICDwCmSwABAAIAivjZQ8SBvzANBgkqhkiG9w0BAQUFADB8MQswCQYDVQQG EwJERTEcMBoGA1UEChMTVEMgVHJ1c3RDZW50ZXIgR21iSDElMCMGA1UECxMcVEMgVHJ1c3RD ZW50ZXIgQ2xhc3MgMSBMMSBDQTEoMCYGA1UEAxMfVEMgVHJ1c3RDZW50ZXIgQ2xhc3MgMSBM MSBDQSBJWDAeFw0xMjA2MDYxOTAyMTZaFw0xMzA2MDcxOTAyMTZaMCgxCzAJBgNVBAYTAkRF MRkwFwYDVQQDDBBNaWNoYWVsIFN0csO2ZGVyMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIB CgKCAQEAxXZGav40rnGNLxEggBW94MILWHlfC8a23Jew5U1gPlfRTXOjjzmoaZ1uCyGdgF6M VvuO9T1aTQNGH+OdeGe3P7Tfc/NsLJFJ2wtd8blvhmodUgse2eypiWjNOd4gZuhalBhgsQ0K b5D6/1foghII4E264iZlJ7AJ+UYcO+GxvFWT0YMTbLckgDkZk7c3qwTozdhYvXarvqx+8Ou/ kuxpQQhac/ebzxpu0N+RHSf2KIUS0g0tEGnPtGv6iL+9QNHc4JKo9Y9KKVw3tQy+Re+FQLxB 1fPE5F+qxuD3AUENpOwkMsqWLM94ohtx3CFqLpxfUPrnKFLAHOhHEbByYGvFPwIDAQABo4IC EDCCAgwwgaUGCCsGAQUFBwEBBIGYMIGVMFEGCCsGAQUFBzAChkVodHRwOi8vd3d3LnRydXN0 Y2VudGVyLmRlL2NlcnRzZXJ2aWNlcy9jYWNlcnRzL3RjX2NsYXNzMV9MMV9DQV9JWC5jcnQw QAYIKwYBBQUHMAGGNGh0dHA6Ly9vY3NwLml4LnRjY2xhc3MxLnRjdW5pdmVyc2FsLWkudHJ1 c3RjZW50ZXIuZGUwHwYDVR0jBBgwFoAU6bgoHUbP/M34TpvF7ktg69g7P9EwDAYDVR0TAQH/ BAIwADBKBgNVHSAEQzBBMD8GCSqCFAAsAQEBATAyMDAGCCsGAQUFBwIBFiRodHRwOi8vd3d3 LnRydXN0Y2VudGVyLmRlL2d1aWRlbGluZXMwDgYDVR0PAQH/BAQDAgTwMB0GA1UdDgQWBBS2 KAWfTfgJ/JQ63qLGwTXYLnI+LzBiBgNVHR8EWzBZMFegVaBThlFodHRwOi8vY3JsLml4LnRj Y2xhc3MxLnRjdW5pdmVyc2FsLWkudHJ1c3RjZW50ZXIuZGUvY3JsL3YyL3RjX0NsYXNzMV9M MV9DQV9JWC5jcmwwMwYDVR0lBCwwKgYIKwYBBQUHAwIGCCsGAQUFBwMEBggrBgEFBQcDBwYK KwYBBAGCNxQCAjAfBgNVHREEGDAWgRRtaWNoYWVsQHN0cm9lZGVyLmNvbTANBgkqhkiG9w0B AQUFAAOCAQEAQ3bvVUpEq+cQrLpcogyt5BJNk/WvUvOHqhzyj28M9pg9hcDl1+MYl5qqj6tR GSTLPQZyf287pcmbMwbcTGZO/gbW9v7RYcut6RauWdwKMCUmKC3J4fVfDq9ZETA2WOV68ef4 B3Gzdhghsbp3Rhp5dDmrCVKAHlafm6ZwJrEQ9P76fxnQZzRLgeKpZep5ePH5YHUB3+YaOQvJ FG0bOXvfHhRiRG7/HW2G+yDgjHSxDz8AFzMWL/RFePqZ4pn6T/SM/qU6WEpW39MWyJNoH/Kx QDYK8gGYuesn1ciMCTnjrvZQj0fonGTO4SfWekJRkuGrJ7dYSZRjYbDcWBBkdFLWzzCCBdgw ggTAoAMCAQICDgboAAEAAkqWLSQM/sXJMA0GCSqGSIb3DQEBBQUAMHkxCzAJBgNVBAYTAkRF MRwwGgYDVQQKExNUQyBUcnVzdENlbnRlciBHbWJIMSQwIgYDVQQLExtUQyBUcnVzdENlbnRl ciBVbml2ZXJzYWwgQ0ExJjAkBgNVBAMTHVRDIFRydXN0Q2VudGVyIFVuaXZlcnNhbCBDQSBJ MB4XDTA5MTEwMzE0MDgxOVoXDTI1MTIzMTIxNTk1OVowfDELMAkGA1UEBhMCREUxHDAaBgNV BAoTE1RDIFRydXN0Q2VudGVyIEdtYkgxJTAjBgNVBAsTHFRDIFRydXN0Q2VudGVyIENsYXNz IDEgTDEgQ0ExKDAmBgNVBAMTH1RDIFRydXN0Q2VudGVyIENsYXNzIDEgTDEgQ0EgSVgwggEi MA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC75pBuz2Lp6QuqthDVR+V8XSsncZpozVVt 5KLv5P7yemMRwleKyH3PjmYfZUVL64Biab1GjovFblqVGCrep/EfdRonq20yU+P7TVhiLP8Z 5cegDZotIYhZhM0d8cPIij6w5d4IJM/8QCy6QSOUu4ASiTVItoYE4AFPjLqpmPwcie0fiqHH hpgmHnJla/7PZdkMZEsaCfVDEWBmJuMzVprJPT40anjG5VBLyM2I5DlsUCaeQCy2O3w3sqf1 3dyzUcv03IICuNc63towXA31Qt0TaVNU6YAmQjMepdfMbspmCZ+G8D2+xophEPPR/1vkstst smUMqX0XrLonTUJczglPAgMBAAGjggJZMIICVTCBmgYIKwYBBQUHAQEEgY0wgYowUgYIKwYB BQUHMAKGRmh0dHA6Ly93d3cudHJ1c3RjZW50ZXIuZGUvY2VydHNlcnZpY2VzL2NhY2VydHMv dGNfdW5pdmVyc2FsX3Jvb3RfSS5jcnQwNAYIKwYBBQUHMAGGKGh0dHA6Ly9vY3NwLnRjdW5p dmVyc2FsLUkudHJ1c3RjZW50ZXIuZGUwHwYDVR0jBBgwFoAUkqR1LKSevoFE63n8isWVpesQ dXMwEgYDVR0TAQH/BAgwBgEB/wIBADBSBgNVHSAESzBJMAYGBFUdIAAwPwYJKoIUACwBAQEB MDIwMAYIKwYBBQUHAgEWJGh0dHA6Ly93d3cudHJ1c3RjZW50ZXIuZGUvZ3VpZGVsaW5lczAO BgNVHQ8BAf8EBAMCAQYwHQYDVR0OBBYEFOm4KB1Gz/zN+E6bxe5LYOvYOz/RMIH9BgNVHR8E gfUwgfIwge+ggeyggemGRmh0dHA6Ly9jcmwudGN1bml2ZXJzYWwtSS50cnVzdGNlbnRlci5k ZS9jcmwvdjIvdGNfdW5pdmVyc2FsX3Jvb3RfSS5jcmyGgZ5sZGFwOi8vd3d3LnRydXN0Y2Vu dGVyLmRlL0NOPVRDJTIwVHJ1c3RDZW50ZXIlMjBVbml2ZXJzYWwlMjBDQSUyMEksTz1UQyUy MFRydXN0Q2VudGVyJTIwR21iSCxPVT1yb290Y2VydHMsREM9dHJ1c3RjZW50ZXIsREM9ZGU/ Y2VydGlmaWNhdGVSZXZvY2F0aW9uTGlzdD9iYXNlPzANBgkqhkiG9w0BAQUFAAOCAQEAOcjE m+6+mO5Icm+N53G2DpCM07LBFSGoRpBoX0oE8TrJaIQh2KXmBHVdn9LU8kt3QzLclctgvwJV 0KwcsMUUl5tlCsMPpR3s2Ek5lbWpvvr0HqtW56blAQiINV9nBd1EJFASIkRjefGbV2nOq9Yz UU+N8HA7jq1ROhd/NZZraGhjthwKyfjfHV7PKxGlY+3M0MbTIG+q/GhIfm0euDpFqhKG88e9 ALXr/uoSn3MzeOcoOWjTpW3adtFO4VWVgKbgG7jNrFbvRVlHmFLbOm4msjE5aXWxLiTwpJ2X iF4zKca1vAdAOgw9us90jEtOeiH6GzjNxEMvb7TfeO6Zkuc6HDGCA84wggPKAgEBMIGPMHwx CzAJBgNVBAYTAkRFMRwwGgYDVQQKExNUQyBUcnVzdENlbnRlciBHbWJIMSUwIwYDVQQLExxU QyBUcnVzdENlbnRlciBDbGFzcyAxIEwxIENBMSgwJgYDVQQDEx9UQyBUcnVzdENlbnRlciBD bGFzcyAxIEwxIENBIElYAg8ApksAAQACAIr42UPEgb8wCQYFKw4DAhoFAKCCAhMwGAYJKoZI hvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTIwOTAyMTMxMTU0WjAjBgkq hkiG9w0BCQQxFgQUCNHgSamIXCRCd4ANj7PKCgRv3BwwbAYJKoZIhvcNAQkPMV8wXTALBglg hkgBZQMEASowCwYJYIZIAWUDBAECMAoGCCqGSIb3DQMHMA4GCCqGSIb3DQMCAgIAgDANBggq hkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDCBoAYJKwYBBAGCNxAEMYGSMIGP MHwxCzAJBgNVBAYTAkRFMRwwGgYDVQQKExNUQyBUcnVzdENlbnRlciBHbWJIMSUwIwYDVQQL ExxUQyBUcnVzdENlbnRlciBDbGFzcyAxIEwxIENBMSgwJgYDVQQDEx9UQyBUcnVzdENlbnRl ciBDbGFzcyAxIEwxIENBIElYAg8ApksAAQACAIr42UPEgb8wgaIGCyqGSIb3DQEJEAILMYGS oIGPMHwxCzAJBgNVBAYTAkRFMRwwGgYDVQQKExNUQyBUcnVzdENlbnRlciBHbWJIMSUwIwYD VQQLExxUQyBUcnVzdENlbnRlciBDbGFzcyAxIEwxIENBMSgwJgYDVQQDEx9UQyBUcnVzdENl bnRlciBDbGFzcyAxIEwxIENBIElYAg8ApksAAQACAIr42UPEgb8wDQYJKoZIhvcNAQEBBQAE ggEAn/eQg7NrUAmmHsFX+FOpynezJ9ocm/9InAnbWvoHFvdbqKYZIRRmu+aZ1a6q3irVg1FX YrzH+OivltVXyvoQvt7WCMeUjiQRjM0PuRd0YlUGu/8qvEVPtcv8i3K/v74MIibRUpItxNes 47NssmY88640oWdfBqkI/KQS44c5rjBADFwPMMXAbWptp4AXoit1MaMTrsPt+2+O2iXAVREB IJkffN1lb6zeizUzWc/H3iPvVMszLZXdTOh/hknsHJHZmwsEqNzNHYhRgYkPoYiiHyi+faa/ mqPSc+22TqsnWL1wYE0HhG3uVWg1fUf3UyhUJ8/VXwmNe9ucfiQI9Kk5rwAAAAAAAA== --------------ms080100030105010600070605--

1 0

Re: (ITS#7378) Slapd hangs on bdb write lock
by nikolai＠net24.co.nz 02 Sep '12

02 Sep '12

I haven't yet - I wanted to collect information before making any changes. I did look at that fix and wasn't confident it would solve our problem. You're right though - I need to test it to rule it out. I will upgrade all the servers to 2.4.32 and report back. On 2/09/2012, at 7:07 AM, Quanah Gibson-Mount wrote: > --On Saturday, September 01, 2012 1:46 PM +0000 nikolai(a)net24.co.nz wrote: > >> Full_Name: Nikolai Schupbach >> Version: 2.4.31 >> OS: FreeBSD >> URL: ftp://ftp.openldap.org/incoming/ >> Submission from: (NULL) (202.78.158.60) > > Have you confirmed this isn't the same thing ITS#7222, fixed in OpenLDAP 2.4.32? > > --Quanah > > > > -- > > Quanah Gibson-Mount > Sr. Member of Technical Staff > Zimbra, Inc > A Division of VMware, Inc. > -------------------- > Zimbra :: the leader in open source messaging and collaboration

1 0

Re: (ITS#7378) Slapd hangs on bdb write lock
by quanah＠zimbra.com 01 Sep '12

01 Sep '12

--On Saturday, September 01, 2012 1:46 PM +0000 nikolai(a)net24.co.nz wrote: > Full_Name: Nikolai Schupbach > Version: 2.4.31 > OS: FreeBSD > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (202.78.158.60) Have you confirmed this isn't the same thing ITS#7222, fixed in OpenLDAP 2.4.32? --Quanah -- Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration

1 0

(ITS#7378) Slapd hangs on bdb write lock
by nikolai＠net24.co.nz 01 Sep '12

01 Sep '12

Full_Name: Nikolai Schupbach Version: 2.4.31 OS: FreeBSD URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (202.78.158.60) We are experiencing frequent hangs in slapd. Once hung we can continue to connect, but all searches will just hang indefinitely until we kill -9 the slapd process and restart it. The directory is used for mail routing and we have been migrating to it from an existing directory server over the last 3 weeks - we have noted the busier the directory becomes the more often it hangs (now once every 2 days). We have one master and 10 syncrepl read only replicas - the master is used mainly for writes and has not hung yet, but most of the replicas have hung at least once. The replicas receive anywhere between 50 to 300 searches/sec, while the master would only get 1/sec. There are 45k entries in the directory. We are running: FreeBSD 8.3/9.0 x64 OpenLDAP 2.4.31 Berkeley DB 4.6.21 The old directory we are migrating from has the same load and is also running OpenLDAP, but has been rock solid for 5 years. It is running Berkeley DB 4.3.29 and OpenLDAP 2.3.27. We have managed to collect db_stat lock information, which indicates the same issue each time - a write lock on dn2id.bdb. Locks grouped by object: Locker Mode Count Status ----------------- Object --------------- 8000a85e READ 1 HELD 0xb26c8 len: 9 data: 60xa800000000000000 8a READ 1 HELD id2entry.bdb handle 0 8c READ 1 HELD dn2id.bdb handle 0 96 READ 1 HELD objectClass.bdb handle 0 93 READ 1 HELD entryCSN.bdb handle 0 90 READ 1 HELD entryUUID.bdb handle 0 8000a85f WRITE 4 HELD dn2id.bdb page 219 80000782 READ 1 HELD dn2id.bdb page 768 80000a45 READ 1 HELD dn2id.bdb page 768 80000b9e READ 1 HELD dn2id.bdb page 768 800006a0 READ 1 HELD dn2id.bdb page 768 80000771 READ 1 HELD dn2id.bdb page 768 80000534 READ 1 HELD dn2id.bdb page 768 80000a44 READ 1 HELD dn2id.bdb page 768 80000641 READ 1 HELD dn2id.bdb page 768 80001049 READ 1 HELD dn2id.bdb page 768 8000104a READ 1 HELD dn2id.bdb page 768 80001048 READ 1 HELD dn2id.bdb page 768 80000783 READ 1 HELD dn2id.bdb page 768 80000535 READ 1 HELD dn2id.bdb page 768 8000066e READ 1 HELD dn2id.bdb page 768 80000697 READ 1 HELD dn2id.bdb page 768 8000a85f READ 1 HELD dn2id.bdb page 768 8000a85e READ 1 HELD 0xb19a8 len: 9 data: 40xa800000000000000 8000a85f READ 1 HELD dn2id.bdb page 933 8000a85f WRITE 2 HELD dn2id.bdb page 933 80001047 WRITE 1 HELD dn2id.bdb page 559 80000782 READ 1 WAIT dn2id.bdb page 559 80000a45 READ 1 WAIT dn2id.bdb page 559 80000b9e READ 1 WAIT dn2id.bdb page 559 800006a0 READ 1 WAIT dn2id.bdb page 559 80000771 READ 1 WAIT dn2id.bdb page 559 80000534 READ 1 WAIT dn2id.bdb page 559 80000a44 READ 1 WAIT dn2id.bdb page 559 80000641 READ 1 WAIT dn2id.bdb page 559 80001049 READ 1 WAIT dn2id.bdb page 559 8000104a READ 1 WAIT dn2id.bdb page 559 80001048 READ 1 WAIT dn2id.bdb page 559 80000783 READ 1 WAIT dn2id.bdb page 559 80000535 READ 1 WAIT dn2id.bdb page 559 8000066e READ 1 WAIT dn2id.bdb page 559 80000697 READ 1 WAIT dn2id.bdb page 559 8000a85f READ 1 WAIT dn2id.bdb page 559 8000a85f READ 2 HELD dn2id.bdb page 1362 8000a85f WRITE 2 HELD dn2id.bdb page 1362 8000a85f READ 2 HELD dn2id.bdb page 1353 8000a85f WRITE 2 HELD dn2id.bdb page 1353 b6 READ 1 HELD uid.bdb handle 0 a5 READ 1 HELD mail.bdb handle 0 af READ 1 HELD mailLocalAddress.bdb handle 0 9b READ 1 HELD miLoginid.bdb handle 0 aa READ 1 HELD mailHost.bdb handle 0 bb READ 1 HELD miDomainName.bdb handle 0 c0 READ 1 HELD mpMailHost.bdb handle 0 a0 READ 1 HELD mpMailUserType.bdb handle 0 We have also collected the backtrace for all the threads which I have uploaded to: ftp://ftp.openldap.org/incoming/nikolai-gdb-120902.txt The full db_stat output is located at: ftp://ftp.openldap.org/incoming/nikolai-dbstat-120902.txt Our DB_CONFIG: # One 512MB cache set_cachesize 0 536870912 1 # Transaction Log settings set_lg_regionmax 1048576 set_lg_max 10485760 set_lg_bsize 2097152 set_flags DB_LOG_AUTOREMOVE # Increase lock maximums set_lk_max_locks 2000 set_lk_max_lockers 2000 set_lk_max_objects 2000 Our slapd.conf on our replicas: # Load the following schema files include /usr/local/etc/openldap/schema/core.schema include /usr/local/etc/openldap/schema/cosine.schema include /usr/local/etc/openldap/schema/nis.schema include /usr/local/etc/openldap/schema/inetorgperson.schema include /usr/local/etc/openldap/schema/misc.schema include /usr/local/etc/openldap/schema/mirapoint.schema include /usr/local/etc/openldap/schema/smp.schema # Runtime settings for slapd pidfile /var/run/openldap/slapd.pid argsfile /var/run/openldap/slapd.args loglevel none # TLS security options for slapd. TLSCipherSuite HIGH TLSCACertificateFile /usr/local/etc/openldap/tls/ca-cert.pem TLSCertificateFile /usr/local/etc/openldap/tls/server-cert.pem TLSCertificateKeyFile /usr/local/etc/openldap/tls/server-key.pem # This option configures one or more hashes to be used in generation # of user passwords stored in the userPassword attribute during # processing of LDAP Password Modify Extended Operations (RFC 3062). password-hash {SSHA} # Load dynamic backend modules: modulepath /usr/local/libexec/openldap moduleload back_bdb moduleload back_monitor # Do not limit size or time of requests. sizelimit unlimited timelimit unlimited # Require authentication prior to directory operations require authc ############################################################################### # BDB Database Definitions # # The following configuration directives relate to bdb database definitions ############################################################################### # The remaining configuration directives relate to bdb database definitions database bdb suffix "o=top" rootdn "cn=root,o=top" # Cleartext passwords, especially for the rootdn, should # be avoid. See slappasswd(8) and slapd.conf(5) for details. rootpw {SSHA}********** # The database directory must exist prior to running slapd and # should only be accessible by the slapd and slap tools. directory /var/db/openldap-data # Indices to maintain index cn eq,sub,pres index entryUUID eq index entryCSN eq index mail eq,sub,pres index mailHost eq index mailLocalAddress eq,sub,pres index miDomainName eq,sub index miLoginId eq,pres index mpMailHost eq index mpMailUserType eq index mpSystemRole eq index objectClass eq,pres index uid eq,pres # Specify the number of entries which should be held in memory cachesize 200000 # Set transactional checkpoint checkpoint 512 60 ############################################################################### # LDAP Sync Replication # # A unique replica id number is required for each replication client ############################################################################### # LDAP sync replication settings syncrepl rid=36 provider=ldaps://ldapmaster/ type=refreshAndPersist retry=30,+ searchbase="o=top" filter="(objectClass=*)" scope=sub attrs="*" sizelimit=unlimited timelimit=unlimited schemachecking=off bindmethod=simple binddn="cn=replica,ou=users,ou=directory,o=top" credentials=********** # Where to refer ldap updates to updateref ldaps://ldapmaster/ ############################################################################### # LDAP Statistics # # The OpenLDAP server can be configured to provide real time performance # statistics through the monitor branch. ############################################################################### # Enable the statistics monitoring database database monitor # Allow access to monitoring user only access to dn.subtree="cn=monitor" by dn.exact="cn=monitor,ou=users,ou=directory,o=top" read by * none Sincerely, Nikolai Schupbach

1 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

openldap-bugs September 2012