Re: (ITS#7364) mdb: clean up POSIX semaphores on environment close.
by h.b.furuseth@usit.uio.no
Howard Chu writes:
> h.b.furuseth(a)usit.uio.no wrote:
>> Reopening this.
>>
>>
>> This is worse with a database with is intended to be used by
>> different users (A and B):
>
> This is pretty much never a use case we would worry about. In most
> applications, a single userID creates and operates on the DB.
I'm fine with just documenting that as a restriction on some systems.
If not:
>> (...)
>> The work-around I can think of is a "multi-uid" mode which instead
>> resets the semaphore with sem_post() if sem_getvalue() returns 0.
>> I don't know how ugly that is considered to be. Could ask
>> comp.programming.unix, or check what Berkeley DB does.
>
> That would be OK in general. It still leaves the question of how to remove the
> semaphore if the DB is being destroyed. But it's probably not worth the
> trouble to worry about this so much. Those OSes should just get their act
> together and support the POSIX process-shared mutexes.
>
>> This mode
>> should use mode 0666 for the semaphores (temporarily setting umask
>> 0, yuck),
>
> Definitely not. The caller specifies a mode; if they want 0666 they
> should configure it as such.
0666 would likely be wrong for the database file(s). But this flag
could just as well consist of specifying a mode parameter for the
semaphores. It's a threaded library doing umask() I dislike.
And, as you say, the need to remove the semaphore afterwards.
>> or it should not sem_unlink() since next user may create
>> the semaphores with a group which gives the wrong users access.
>
> Same as today where running slapadd with the wrong uid causes trouble
> for the following slapd. The answer is obvious: use the right uid when
> accessing the DB.
We're talking about the case where there is no single "right" uid.
Not relevant for slapd, only libmdb by itself.
>> Other matters with the current implementation - I'll patch these:
>>
>> mdb_env_excl_lock() need not retry getting a non-exclusive lock when
>> closing. mdb_env_close() can pass *excl = -1 to tell it not to.
>>
>> mdb_env_setup_locks() can sem_unlink both semaphores before doing
>> anything else, so that reopening a database as root will clean up.
>> Drop the error checks of sem_unlink (so both get called), instead
>> use O_EXCL in sem_open(,O_CREAT,,). Unless I'm missing something,
>> the error checks just work like an emulation of O_EXCL anyway.
>
> The sem_unlink() and sem_open() sequence isn't ideal, certainly. I would
> prefer to just use the existing semaphore.
...followed by 'if (sem_getvalue() shows 0) sem_post()' as above, then.
--
Hallvard
9 years, 11 months
Re: (ITS#7378) Slapd hangs on bdb write lock
by hyc@symas.com
michael(a)stroeder.com wrote:
> This is a cryptographically signed message in MIME format.
>
> --------------ms080100030105010600070605
> Content-Type: text/plain; charset=ISO-8859-1
> Content-Transfer-Encoding: quoted-printable
>
> A couple of days ago I had a hang with OpenLDAP 2.4.32 / back-hdb running=
> on
> Debian Squeeze, self-compiled against BDB 4.8.30. It seemed Database was
> locked as restarting slapd of even rebooting OS did not help. Unfortunate=
> ly I
> had to bring up the system as fast as possible and could not examine the =
> problem.
db_recover will always return the DB to a usable state and reset any DB locks.
(It completely deletes the lock region, so there cannot be any stale locks
after it runs.)
> The system has only 200 entries and not much load yet. I had renamed entr=
> ies
> with web2ldap when all 4 masters (4-way MMR) locked up one after the othe=
> r.
> So there seem to be lockup problems in 2.4.32.
The only way to know if you're seeing the same problem as the original poster
is if you provide db_stat -CA and gdb trace output, like the original poster did.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
9 years, 11 months
Re: (ITS#7364) mdb: clean up POSIX semaphores on environment close.
by hyc@symas.com
Maucci, Cyrille wrote:
> -----Original Message-----
> From: openldap-bugs-bounces(a)OpenLDAP.org [mailto:openldap-bugs-bounces@OpenLDAP.org] On Behalf Of hyc(a)symas.com
>>> Another possibility is to just use SysV semaphores instead of POSIX semaphores.
>>> Then you can use the ipcs(1) command to manually cleanup. BerkeleyDB uses
>>> SysV shared memory when you specify a shared memory environment and it
>>> appears that SysV semaphore support is actually more widespread than POSIX semaphores.
>
> Just to mention that at least on HPUX, Posix semaphores are more efficient than SysV ones.
I'm also reminded that there is no defined behavior for SysV semaphores in
threads, they are only speciried for inter-process synchronization. So forget
that...
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
9 years, 11 months
Re: (ITS#7378) Slapd hangs on bdb write lock
by hyc@symas.com
nikolai(a)net24.co.nz wrote:
> Full_Name: Nikolai Schupbach
> Version: 2.4.31
> OS: FreeBSD
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (202.78.158.60)
>
>
> We are experiencing frequent hangs in slapd. Once hung we can continue to
> connect, but all searches will just hang indefinitely until we kill -9 the slapd
> process and restart it. The directory is used for mail routing and we have been
> migrating to it from an existing directory server over the last 3 weeks - we
> have noted the busier the directory becomes the more often it hangs (now once
> every 2 days).
>
> We have one master and 10 syncrepl read only replicas - the master is used
> mainly for writes and has not hung yet, but most of the replicas have hung at
> least once. The replicas receive anywhere between 50 to 300 searches/sec, while
> the master would only get 1/sec. There are 45k entries in the directory.
>
> We are running:
>
> FreeBSD 8.3/9.0 x64
> OpenLDAP 2.4.31
> Berkeley DB 4.6.21
>
> The old directory we are migrating from has the same load and is also running
> OpenLDAP, but has been rock solid for 5 years. It is running Berkeley DB 4.3.29
> and OpenLDAP 2.3.27.
>
> We have managed to collect db_stat lock information, which indicates the same
> issue each time - a write lock on dn2id.bdb.
It's more than that. Your db_stat shows that a single thread has 3 active
transactions. This should never happen:
8000a85e dd= 0 locks held 2 write locks 0 pid/thread 88000/34386526336
8000a85e READ 1 HELD 0xb19a8 len: 9 data: 40xa800000000000000
8000a85e READ 1 HELD 0xb26c8 len: 9 data: 60xa800000000000000
8000a85f dd= 0 locks held 8 write locks 4 pid/thread 88000/34386526336
8000a85f READ 1 WAIT dn2id.bdb page 559
8000a85f READ 1 HELD dn2id.bdb page 768
8000a85f WRITE 2 HELD dn2id.bdb page 1362
8000a85f READ 2 HELD dn2id.bdb page 1362
8000a85f WRITE 2 HELD dn2id.bdb page 1353
8000a85f READ 2 HELD dn2id.bdb page 1353
8000a85f WRITE 2 HELD dn2id.bdb page 933
8000a85f READ 1 HELD dn2id.bdb page 933
8000a85f WRITE 4 HELD dn2id.bdb page 219
80001047 dd=28 locks held 1 write locks 1 pid/thread 88000/34386526336
80001047 WRITE 1 HELD dn2id.bdb page 559
I would first recommend changing from BDB 4.6.21 to some other version. There
are no code paths in back-bdb where we would ever return without either
committing or aborting the current transactions, so this appears to be a BDB
bug, not an OpenLDAP bug.
> We have also collected the backtrace for all the threads which I have uploaded
> to:
>
> ftp://ftp.openldap.org/incoming/nikolai-gdb-120902.txt
>
> The full db_stat output is located at:
>
> ftp://ftp.openldap.org/incoming/nikolai-dbstat-120902.txt
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
9 years, 11 months
RE: (ITS#7364) mdb: clean up POSIX semaphores on environment close.
by cyrille.maucci@hp.com
-----Original Message-----
From: openldap-bugs-bounces(a)OpenLDAP.org [mailto:openldap-bugs-bounces@OpenLDAP.org] On Behalf Of hyc(a)symas.com
>> Another possibility is to just use SysV semaphores instead of POSIX semaphores.
>> Then you can use the ipcs(1) command to manually cleanup. BerkeleyDB uses
>> SysV shared memory when you specify a shared memory environment and it
>> appears that SysV semaphore support is actually more widespread than POSIX semaphores.
Just to mention that at least on HPUX, Posix semaphores are more efficient than SysV ones.
++Cyrille
9 years, 11 months
Re: (ITS#7364) mdb: clean up POSIX semaphores on environment close.
by hyc@symas.com
h.b.furuseth(a)usit.uio.no wrote:
> Reopening this.
>
>
> This is worse with a database with is intended to be used by
> different users (A and B):
This is pretty much never a use case we would worry about. In most
applications, a single userID creates and operates on the DB.
> A opens the DB and creates the semaphores with e.g. mode 0660.
> B opens it, A closes, B closes - and fails sem_unlink() which
> only A and root can do.
>
> Next, B (or C) fails mdb_env_open() because sem_unlink() fails
> again.
>
> The work-around I can think of is a "multi-uid" mode which instead
> resets the semaphore with sem_post() if sem_getvalue() returns 0.
> I don't know how ugly that is considered to be. Could ask
> comp.programming.unix, or check what Berkeley DB does.
That would be OK in general. It still leaves the question of how to remove the
semaphore if the DB is being destroyed. But it's probably not worth the
trouble to worry about this so much. Those OSes should just get their act
together and support the POSIX process-shared mutexes.
> This mode
> should use mode 0666 for the semaphores (temporarily setting umask
> 0, yuck),
Definitely not. The caller specifies a mode; if they want 0666 they should
configure it as such.
> or it should not sem_unlink() since next user may create
> the semaphores with a group which gives the wrong users access.
Same as today where running slapadd with the wrong uid causes trouble for the
following slapd. The answer is obvious: use the right uid when accessing the DB.
> Other matters with the current implementation - I'll patch these:
>
> mdb_env_excl_lock() need not retry getting a non-exclusive lock when
> closing. mdb_env_close() can pass *excl = -1 to tell it not to.
>
> mdb_env_setup_locks() can sem_unlink both semaphores before doing
> anything else, so that reopening a database as root will clean up.
> Drop the error checks of sem_unlink (so both get called), instead
> use O_EXCL in sem_open(,O_CREAT,,). Unless I'm missing something,
> the error checks just work like an emulation of O_EXCL anyway.
The sem_unlink() and sem_open() sequence isn't ideal, certainly. I would
prefer to just use the existing semaphore.
Another possibility is to just use SysV semaphores instead of POSIX
semaphores. Then you can use the ipcs(1) command to manually cleanup.
BerkeleyDB uses SysV shared memory when you specify a shared memory
environment and it appears that SysV semaphore support is actually more
widespread than POSIX semaphores.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
9 years, 11 months
Re: (ITS#7378) Slapd hangs on bdb write lock
by michael@stroeder.com
This is a cryptographically signed message in MIME format.
--------------ms080100030105010600070605
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
A couple of days ago I had a hang with OpenLDAP 2.4.32 / back-hdb running=
on
Debian Squeeze, self-compiled against BDB 4.8.30. It seemed Database was
locked as restarting slapd of even rebooting OS did not help. Unfortunate=
ly I
had to bring up the system as fast as possible and could not examine the =
problem.
The system has only 200 entries and not much load yet. I had renamed entr=
ies
with web2ldap when all 4 masters (4-way MMR) locked up one after the othe=
r.
So there seem to be lockup problems in 2.4.32.
--------------ms080100030105010600070605
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature
MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIILHzCC
BT8wggQnoAMCAQICDwCmSwABAAIAivjZQ8SBvzANBgkqhkiG9w0BAQUFADB8MQswCQYDVQQG
EwJERTEcMBoGA1UEChMTVEMgVHJ1c3RDZW50ZXIgR21iSDElMCMGA1UECxMcVEMgVHJ1c3RD
ZW50ZXIgQ2xhc3MgMSBMMSBDQTEoMCYGA1UEAxMfVEMgVHJ1c3RDZW50ZXIgQ2xhc3MgMSBM
MSBDQSBJWDAeFw0xMjA2MDYxOTAyMTZaFw0xMzA2MDcxOTAyMTZaMCgxCzAJBgNVBAYTAkRF
MRkwFwYDVQQDDBBNaWNoYWVsIFN0csO2ZGVyMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIB
CgKCAQEAxXZGav40rnGNLxEggBW94MILWHlfC8a23Jew5U1gPlfRTXOjjzmoaZ1uCyGdgF6M
VvuO9T1aTQNGH+OdeGe3P7Tfc/NsLJFJ2wtd8blvhmodUgse2eypiWjNOd4gZuhalBhgsQ0K
b5D6/1foghII4E264iZlJ7AJ+UYcO+GxvFWT0YMTbLckgDkZk7c3qwTozdhYvXarvqx+8Ou/
kuxpQQhac/ebzxpu0N+RHSf2KIUS0g0tEGnPtGv6iL+9QNHc4JKo9Y9KKVw3tQy+Re+FQLxB
1fPE5F+qxuD3AUENpOwkMsqWLM94ohtx3CFqLpxfUPrnKFLAHOhHEbByYGvFPwIDAQABo4IC
EDCCAgwwgaUGCCsGAQUFBwEBBIGYMIGVMFEGCCsGAQUFBzAChkVodHRwOi8vd3d3LnRydXN0
Y2VudGVyLmRlL2NlcnRzZXJ2aWNlcy9jYWNlcnRzL3RjX2NsYXNzMV9MMV9DQV9JWC5jcnQw
QAYIKwYBBQUHMAGGNGh0dHA6Ly9vY3NwLml4LnRjY2xhc3MxLnRjdW5pdmVyc2FsLWkudHJ1
c3RjZW50ZXIuZGUwHwYDVR0jBBgwFoAU6bgoHUbP/M34TpvF7ktg69g7P9EwDAYDVR0TAQH/
BAIwADBKBgNVHSAEQzBBMD8GCSqCFAAsAQEBATAyMDAGCCsGAQUFBwIBFiRodHRwOi8vd3d3
LnRydXN0Y2VudGVyLmRlL2d1aWRlbGluZXMwDgYDVR0PAQH/BAQDAgTwMB0GA1UdDgQWBBS2
KAWfTfgJ/JQ63qLGwTXYLnI+LzBiBgNVHR8EWzBZMFegVaBThlFodHRwOi8vY3JsLml4LnRj
Y2xhc3MxLnRjdW5pdmVyc2FsLWkudHJ1c3RjZW50ZXIuZGUvY3JsL3YyL3RjX0NsYXNzMV9M
MV9DQV9JWC5jcmwwMwYDVR0lBCwwKgYIKwYBBQUHAwIGCCsGAQUFBwMEBggrBgEFBQcDBwYK
KwYBBAGCNxQCAjAfBgNVHREEGDAWgRRtaWNoYWVsQHN0cm9lZGVyLmNvbTANBgkqhkiG9w0B
AQUFAAOCAQEAQ3bvVUpEq+cQrLpcogyt5BJNk/WvUvOHqhzyj28M9pg9hcDl1+MYl5qqj6tR
GSTLPQZyf287pcmbMwbcTGZO/gbW9v7RYcut6RauWdwKMCUmKC3J4fVfDq9ZETA2WOV68ef4
B3Gzdhghsbp3Rhp5dDmrCVKAHlafm6ZwJrEQ9P76fxnQZzRLgeKpZep5ePH5YHUB3+YaOQvJ
FG0bOXvfHhRiRG7/HW2G+yDgjHSxDz8AFzMWL/RFePqZ4pn6T/SM/qU6WEpW39MWyJNoH/Kx
QDYK8gGYuesn1ciMCTnjrvZQj0fonGTO4SfWekJRkuGrJ7dYSZRjYbDcWBBkdFLWzzCCBdgw
ggTAoAMCAQICDgboAAEAAkqWLSQM/sXJMA0GCSqGSIb3DQEBBQUAMHkxCzAJBgNVBAYTAkRF
MRwwGgYDVQQKExNUQyBUcnVzdENlbnRlciBHbWJIMSQwIgYDVQQLExtUQyBUcnVzdENlbnRl
ciBVbml2ZXJzYWwgQ0ExJjAkBgNVBAMTHVRDIFRydXN0Q2VudGVyIFVuaXZlcnNhbCBDQSBJ
MB4XDTA5MTEwMzE0MDgxOVoXDTI1MTIzMTIxNTk1OVowfDELMAkGA1UEBhMCREUxHDAaBgNV
BAoTE1RDIFRydXN0Q2VudGVyIEdtYkgxJTAjBgNVBAsTHFRDIFRydXN0Q2VudGVyIENsYXNz
IDEgTDEgQ0ExKDAmBgNVBAMTH1RDIFRydXN0Q2VudGVyIENsYXNzIDEgTDEgQ0EgSVgwggEi
MA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC75pBuz2Lp6QuqthDVR+V8XSsncZpozVVt
5KLv5P7yemMRwleKyH3PjmYfZUVL64Biab1GjovFblqVGCrep/EfdRonq20yU+P7TVhiLP8Z
5cegDZotIYhZhM0d8cPIij6w5d4IJM/8QCy6QSOUu4ASiTVItoYE4AFPjLqpmPwcie0fiqHH
hpgmHnJla/7PZdkMZEsaCfVDEWBmJuMzVprJPT40anjG5VBLyM2I5DlsUCaeQCy2O3w3sqf1
3dyzUcv03IICuNc63towXA31Qt0TaVNU6YAmQjMepdfMbspmCZ+G8D2+xophEPPR/1vkstst
smUMqX0XrLonTUJczglPAgMBAAGjggJZMIICVTCBmgYIKwYBBQUHAQEEgY0wgYowUgYIKwYB
BQUHMAKGRmh0dHA6Ly93d3cudHJ1c3RjZW50ZXIuZGUvY2VydHNlcnZpY2VzL2NhY2VydHMv
dGNfdW5pdmVyc2FsX3Jvb3RfSS5jcnQwNAYIKwYBBQUHMAGGKGh0dHA6Ly9vY3NwLnRjdW5p
dmVyc2FsLUkudHJ1c3RjZW50ZXIuZGUwHwYDVR0jBBgwFoAUkqR1LKSevoFE63n8isWVpesQ
dXMwEgYDVR0TAQH/BAgwBgEB/wIBADBSBgNVHSAESzBJMAYGBFUdIAAwPwYJKoIUACwBAQEB
MDIwMAYIKwYBBQUHAgEWJGh0dHA6Ly93d3cudHJ1c3RjZW50ZXIuZGUvZ3VpZGVsaW5lczAO
BgNVHQ8BAf8EBAMCAQYwHQYDVR0OBBYEFOm4KB1Gz/zN+E6bxe5LYOvYOz/RMIH9BgNVHR8E
gfUwgfIwge+ggeyggemGRmh0dHA6Ly9jcmwudGN1bml2ZXJzYWwtSS50cnVzdGNlbnRlci5k
ZS9jcmwvdjIvdGNfdW5pdmVyc2FsX3Jvb3RfSS5jcmyGgZ5sZGFwOi8vd3d3LnRydXN0Y2Vu
dGVyLmRlL0NOPVRDJTIwVHJ1c3RDZW50ZXIlMjBVbml2ZXJzYWwlMjBDQSUyMEksTz1UQyUy
MFRydXN0Q2VudGVyJTIwR21iSCxPVT1yb290Y2VydHMsREM9dHJ1c3RjZW50ZXIsREM9ZGU/
Y2VydGlmaWNhdGVSZXZvY2F0aW9uTGlzdD9iYXNlPzANBgkqhkiG9w0BAQUFAAOCAQEAOcjE
m+6+mO5Icm+N53G2DpCM07LBFSGoRpBoX0oE8TrJaIQh2KXmBHVdn9LU8kt3QzLclctgvwJV
0KwcsMUUl5tlCsMPpR3s2Ek5lbWpvvr0HqtW56blAQiINV9nBd1EJFASIkRjefGbV2nOq9Yz
UU+N8HA7jq1ROhd/NZZraGhjthwKyfjfHV7PKxGlY+3M0MbTIG+q/GhIfm0euDpFqhKG88e9
ALXr/uoSn3MzeOcoOWjTpW3adtFO4VWVgKbgG7jNrFbvRVlHmFLbOm4msjE5aXWxLiTwpJ2X
iF4zKca1vAdAOgw9us90jEtOeiH6GzjNxEMvb7TfeO6Zkuc6HDGCA84wggPKAgEBMIGPMHwx
CzAJBgNVBAYTAkRFMRwwGgYDVQQKExNUQyBUcnVzdENlbnRlciBHbWJIMSUwIwYDVQQLExxU
QyBUcnVzdENlbnRlciBDbGFzcyAxIEwxIENBMSgwJgYDVQQDEx9UQyBUcnVzdENlbnRlciBD
bGFzcyAxIEwxIENBIElYAg8ApksAAQACAIr42UPEgb8wCQYFKw4DAhoFAKCCAhMwGAYJKoZI
hvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTIwOTAyMTMxMTU0WjAjBgkq
hkiG9w0BCQQxFgQUCNHgSamIXCRCd4ANj7PKCgRv3BwwbAYJKoZIhvcNAQkPMV8wXTALBglg
hkgBZQMEASowCwYJYIZIAWUDBAECMAoGCCqGSIb3DQMHMA4GCCqGSIb3DQMCAgIAgDANBggq
hkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDCBoAYJKwYBBAGCNxAEMYGSMIGP
MHwxCzAJBgNVBAYTAkRFMRwwGgYDVQQKExNUQyBUcnVzdENlbnRlciBHbWJIMSUwIwYDVQQL
ExxUQyBUcnVzdENlbnRlciBDbGFzcyAxIEwxIENBMSgwJgYDVQQDEx9UQyBUcnVzdENlbnRl
ciBDbGFzcyAxIEwxIENBIElYAg8ApksAAQACAIr42UPEgb8wgaIGCyqGSIb3DQEJEAILMYGS
oIGPMHwxCzAJBgNVBAYTAkRFMRwwGgYDVQQKExNUQyBUcnVzdENlbnRlciBHbWJIMSUwIwYD
VQQLExxUQyBUcnVzdENlbnRlciBDbGFzcyAxIEwxIENBMSgwJgYDVQQDEx9UQyBUcnVzdENl
bnRlciBDbGFzcyAxIEwxIENBIElYAg8ApksAAQACAIr42UPEgb8wDQYJKoZIhvcNAQEBBQAE
ggEAn/eQg7NrUAmmHsFX+FOpynezJ9ocm/9InAnbWvoHFvdbqKYZIRRmu+aZ1a6q3irVg1FX
YrzH+OivltVXyvoQvt7WCMeUjiQRjM0PuRd0YlUGu/8qvEVPtcv8i3K/v74MIibRUpItxNes
47NssmY88640oWdfBqkI/KQS44c5rjBADFwPMMXAbWptp4AXoit1MaMTrsPt+2+O2iXAVREB
IJkffN1lb6zeizUzWc/H3iPvVMszLZXdTOh/hknsHJHZmwsEqNzNHYhRgYkPoYiiHyi+faa/
mqPSc+22TqsnWL1wYE0HhG3uVWg1fUf3UyhUJ8/VXwmNe9ucfiQI9Kk5rwAAAAAAAA==
--------------ms080100030105010600070605--
9 years, 11 months
Re: (ITS#7378) Slapd hangs on bdb write lock
by nikolai@net24.co.nz
I haven't yet - I wanted to collect information before making any changes. I did look at that fix and wasn't confident it would solve our problem. You're right though - I need to test it to rule it out. I will upgrade all the servers to 2.4.32 and report back.
On 2/09/2012, at 7:07 AM, Quanah Gibson-Mount wrote:
> --On Saturday, September 01, 2012 1:46 PM +0000 nikolai(a)net24.co.nz wrote:
>
>> Full_Name: Nikolai Schupbach
>> Version: 2.4.31
>> OS: FreeBSD
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (202.78.158.60)
>
> Have you confirmed this isn't the same thing ITS#7222, fixed in OpenLDAP 2.4.32?
>
> --Quanah
>
>
>
> --
>
> Quanah Gibson-Mount
> Sr. Member of Technical Staff
> Zimbra, Inc
> A Division of VMware, Inc.
> --------------------
> Zimbra :: the leader in open source messaging and collaboration
9 years, 11 months
Re: (ITS#7378) Slapd hangs on bdb write lock
by quanah@zimbra.com
--On Saturday, September 01, 2012 1:46 PM +0000 nikolai(a)net24.co.nz wrote:
> Full_Name: Nikolai Schupbach
> Version: 2.4.31
> OS: FreeBSD
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (202.78.158.60)
Have you confirmed this isn't the same thing ITS#7222, fixed in OpenLDAP
2.4.32?
--Quanah
--
Quanah Gibson-Mount
Sr. Member of Technical Staff
Zimbra, Inc
A Division of VMware, Inc.
--------------------
Zimbra :: the leader in open source messaging and collaboration
9 years, 11 months
(ITS#7378) Slapd hangs on bdb write lock
by nikolai@net24.co.nz
Full_Name: Nikolai Schupbach
Version: 2.4.31
OS: FreeBSD
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (202.78.158.60)
We are experiencing frequent hangs in slapd. Once hung we can continue to
connect, but all searches will just hang indefinitely until we kill -9 the slapd
process and restart it. The directory is used for mail routing and we have been
migrating to it from an existing directory server over the last 3 weeks - we
have noted the busier the directory becomes the more often it hangs (now once
every 2 days).
We have one master and 10 syncrepl read only replicas - the master is used
mainly for writes and has not hung yet, but most of the replicas have hung at
least once. The replicas receive anywhere between 50 to 300 searches/sec, while
the master would only get 1/sec. There are 45k entries in the directory.
We are running:
FreeBSD 8.3/9.0 x64
OpenLDAP 2.4.31
Berkeley DB 4.6.21
The old directory we are migrating from has the same load and is also running
OpenLDAP, but has been rock solid for 5 years. It is running Berkeley DB 4.3.29
and OpenLDAP 2.3.27.
We have managed to collect db_stat lock information, which indicates the same
issue each time - a write lock on dn2id.bdb.
Locks grouped by object:
Locker Mode Count Status ----------------- Object ---------------
8000a85e READ 1 HELD 0xb26c8 len: 9 data: 60xa800000000000000
8a READ 1 HELD id2entry.bdb handle 0
8c READ 1 HELD dn2id.bdb handle 0
96 READ 1 HELD objectClass.bdb handle 0
93 READ 1 HELD entryCSN.bdb handle 0
90 READ 1 HELD entryUUID.bdb handle 0
8000a85f WRITE 4 HELD dn2id.bdb page 219
80000782 READ 1 HELD dn2id.bdb page 768
80000a45 READ 1 HELD dn2id.bdb page 768
80000b9e READ 1 HELD dn2id.bdb page 768
800006a0 READ 1 HELD dn2id.bdb page 768
80000771 READ 1 HELD dn2id.bdb page 768
80000534 READ 1 HELD dn2id.bdb page 768
80000a44 READ 1 HELD dn2id.bdb page 768
80000641 READ 1 HELD dn2id.bdb page 768
80001049 READ 1 HELD dn2id.bdb page 768
8000104a READ 1 HELD dn2id.bdb page 768
80001048 READ 1 HELD dn2id.bdb page 768
80000783 READ 1 HELD dn2id.bdb page 768
80000535 READ 1 HELD dn2id.bdb page 768
8000066e READ 1 HELD dn2id.bdb page 768
80000697 READ 1 HELD dn2id.bdb page 768
8000a85f READ 1 HELD dn2id.bdb page 768
8000a85e READ 1 HELD 0xb19a8 len: 9 data: 40xa800000000000000
8000a85f READ 1 HELD dn2id.bdb page 933
8000a85f WRITE 2 HELD dn2id.bdb page 933
80001047 WRITE 1 HELD dn2id.bdb page 559
80000782 READ 1 WAIT dn2id.bdb page 559
80000a45 READ 1 WAIT dn2id.bdb page 559
80000b9e READ 1 WAIT dn2id.bdb page 559
800006a0 READ 1 WAIT dn2id.bdb page 559
80000771 READ 1 WAIT dn2id.bdb page 559
80000534 READ 1 WAIT dn2id.bdb page 559
80000a44 READ 1 WAIT dn2id.bdb page 559
80000641 READ 1 WAIT dn2id.bdb page 559
80001049 READ 1 WAIT dn2id.bdb page 559
8000104a READ 1 WAIT dn2id.bdb page 559
80001048 READ 1 WAIT dn2id.bdb page 559
80000783 READ 1 WAIT dn2id.bdb page 559
80000535 READ 1 WAIT dn2id.bdb page 559
8000066e READ 1 WAIT dn2id.bdb page 559
80000697 READ 1 WAIT dn2id.bdb page 559
8000a85f READ 1 WAIT dn2id.bdb page 559
8000a85f READ 2 HELD dn2id.bdb page 1362
8000a85f WRITE 2 HELD dn2id.bdb page 1362
8000a85f READ 2 HELD dn2id.bdb page 1353
8000a85f WRITE 2 HELD dn2id.bdb page 1353
b6 READ 1 HELD uid.bdb handle 0
a5 READ 1 HELD mail.bdb handle 0
af READ 1 HELD mailLocalAddress.bdb handle 0
9b READ 1 HELD miLoginid.bdb handle 0
aa READ 1 HELD mailHost.bdb handle 0
bb READ 1 HELD miDomainName.bdb handle 0
c0 READ 1 HELD mpMailHost.bdb handle 0
a0 READ 1 HELD mpMailUserType.bdb handle 0
We have also collected the backtrace for all the threads which I have uploaded
to:
ftp://ftp.openldap.org/incoming/nikolai-gdb-120902.txt
The full db_stat output is located at:
ftp://ftp.openldap.org/incoming/nikolai-dbstat-120902.txt
Our DB_CONFIG:
# One 512MB cache
set_cachesize 0 536870912 1
# Transaction Log settings
set_lg_regionmax 1048576
set_lg_max 10485760
set_lg_bsize 2097152
set_flags DB_LOG_AUTOREMOVE
# Increase lock maximums
set_lk_max_locks 2000
set_lk_max_lockers 2000
set_lk_max_objects 2000
Our slapd.conf on our replicas:
# Load the following schema files
include /usr/local/etc/openldap/schema/core.schema
include /usr/local/etc/openldap/schema/cosine.schema
include /usr/local/etc/openldap/schema/nis.schema
include /usr/local/etc/openldap/schema/inetorgperson.schema
include /usr/local/etc/openldap/schema/misc.schema
include /usr/local/etc/openldap/schema/mirapoint.schema
include /usr/local/etc/openldap/schema/smp.schema
# Runtime settings for slapd
pidfile /var/run/openldap/slapd.pid
argsfile /var/run/openldap/slapd.args
loglevel none
# TLS security options for slapd.
TLSCipherSuite HIGH
TLSCACertificateFile /usr/local/etc/openldap/tls/ca-cert.pem
TLSCertificateFile /usr/local/etc/openldap/tls/server-cert.pem
TLSCertificateKeyFile /usr/local/etc/openldap/tls/server-key.pem
# This option configures one or more hashes to be used in generation
# of user passwords stored in the userPassword attribute during
# processing of LDAP Password Modify Extended Operations (RFC 3062).
password-hash {SSHA}
# Load dynamic backend modules:
modulepath /usr/local/libexec/openldap
moduleload back_bdb
moduleload back_monitor
# Do not limit size or time of requests.
sizelimit unlimited
timelimit unlimited
# Require authentication prior to directory operations
require authc
###############################################################################
# BDB Database Definitions
#
# The following configuration directives relate to bdb database definitions
###############################################################################
# The remaining configuration directives relate to bdb database definitions
database bdb
suffix "o=top"
rootdn "cn=root,o=top"
# Cleartext passwords, especially for the rootdn, should
# be avoid. See slappasswd(8) and slapd.conf(5) for details.
rootpw {SSHA}**********
# The database directory must exist prior to running slapd and
# should only be accessible by the slapd and slap tools.
directory /var/db/openldap-data
# Indices to maintain
index cn eq,sub,pres
index entryUUID eq
index entryCSN eq
index mail eq,sub,pres
index mailHost eq
index mailLocalAddress eq,sub,pres
index miDomainName eq,sub
index miLoginId eq,pres
index mpMailHost eq
index mpMailUserType eq
index mpSystemRole eq
index objectClass eq,pres
index uid eq,pres
# Specify the number of entries which should be held in memory
cachesize 200000
# Set transactional checkpoint
checkpoint 512 60
###############################################################################
# LDAP Sync Replication
#
# A unique replica id number is required for each replication client
###############################################################################
# LDAP sync replication settings
syncrepl rid=36
provider=ldaps://ldapmaster/
type=refreshAndPersist
retry=30,+
searchbase="o=top"
filter="(objectClass=*)"
scope=sub
attrs="*"
sizelimit=unlimited
timelimit=unlimited
schemachecking=off
bindmethod=simple
binddn="cn=replica,ou=users,ou=directory,o=top"
credentials=**********
# Where to refer ldap updates to
updateref ldaps://ldapmaster/
###############################################################################
# LDAP Statistics
#
# The OpenLDAP server can be configured to provide real time performance
# statistics through the monitor branch.
###############################################################################
# Enable the statistics monitoring database
database monitor
# Allow access to monitoring user only
access to dn.subtree="cn=monitor"
by dn.exact="cn=monitor,ou=users,ou=directory,o=top" read
by * none
Sincerely,
Nikolai Schupbach
9 years, 11 months