Re: (ITS#6276) paused pool can deadlock if writers are waiting
by h.b.furuseth@usit.uio.no
Re-reading this, I'm not sure what either of us are talking about...
About reading a byte and putting it back if select() says the socket is
readable: I suppose this is only necessary if conn->c_writewaiter != 0,
and maybe only if the pool is paused. And remember that the next
select() will often also find the socket is readable, so we must read
the next byte (from the socket, not the sockbuf) and put that on the
sockbut stack, and so on.
About multiple types of pauses: The concept must be rephrased for that
to make sense. The pause mechanism is in effect a pool-wide read/write
lock which favors writers over readers. A pool thread readlocks it
while running a task. pool_pause() does unlock - writelock.
pool_resume() and pool_pausecheck() do unlock - readlock.
Now it's possible to talk about moving a few things out of this
particular lock, and if necessary protect it with another lock.
--
Hallvard
14 years, 3 months
Re: (ITS#6275) syncrepl taking long(not sync) when consumer not connect for a moment
by rlvcosta@yahoo.com
--0-698620411-1251380367=:64533
Content-Type: text/plain; charset=us-ascii
Quanah,
Please see answer in your previous e-mail below.
I'm also sending the information I could collect attached since it is a
small file(5KB).
The behavior that appears strange and that could indicate a problem is
the fact that even when consumer is stopped the provider still doing
something for a long time. This doesn't appear to be correct.
Other strange behavior is that when system enters in this state one
provider CPU stays running around 100% CPU usage. I made a jmeter script
to test individual bind/search(no ldapsearch *) and then even with some
load(like 200 simultaneous query) I do not see CPU in 100%. Something
doesn't appear to be ok since I do not see why CPU should enter in 100%
permanently.
Regards,
Rodrigo.
Quanah Gibson-Mount wrote:
> --On Tuesday, August 25, 2009 6:16 PM +0000 rlvcosta(a)yahoo.com wrote:
>
>> The issue I'm facing is related, in a general user view, is when I stop
>> the secondary Provider2(master 2) for backup purposes using slapcat. The
>> Provider1(master 1) continues to provide ldap service where some
>> entrances can be created during the time backup is running(no consumer
>> from Provider 2).
>
> Why are you stopping the provider to do a slapcat?
[Rodrigo]Faster dump of data. And in any case if other situation like a
problema occurs the secondary system could stay disconnect for other
reasons.
>
>> Even a small number of entrances are different when consumer in
>> Provider 2
>> connects to Provider 1 then syncrepl enters in the full DB search as
>> expected.
>
>
> What is your sessionlog setting on each provider for the syncprov
> overlay?
[Rodrigo]
syncprov-checkpoint 10000 120
syncprov-sessionlog 100000
Same configuration in both systems.
>
>> For definition purposes I have some memory limitations where I need to
>> limit dncachesize for around 80% of DB entrances.
>
> We already went through other things you could do to reduce your
> memory footprint in other ways. You've completely ignored that
> advice. As long as your dncachesize is in this state, I don't expect
> things to behave normally.
[Rodrigo]I implemented what was possible. The end is this cache config
possible by the memory constraints :
#Cache values
#cachesize 10000
cachesize 20000
dncachesize 3000000
#dncachesize 400000
#idlcachesize 10000
idlcachesize 30000
#cachefree 10
cachefree 100
>
>> I could also note that when in this situation the monitor cache, in a
>> very slow pace, changes the cache in a single entrance. Being more
>> specific :
>>
>> dn: cn=Database 1,cn=Databases,cn=Monitor
>> structuralObjectClass: monitoredObject
>> creatorsName:
>> modifiersName:
>> createTimestamp: 20090821145848Z
>> modifyTimestamp: 20090821145848Z
>> monitoredInfo: bdb
>> monitorIsShadow: TRUE
>> namingContexts: ou=CONTENT,o=domain,c=fr
>> readOnly: FALSE
>> monitorOverlay: syncprov
>> olmBDBEntryCache: 19920
>> olmBDBDNCache: 3896287
>> olmBDBIDLCache: 2
>> olmDbDirectory: /var/openldap-data/bdb1/
>> entryDN: cn=Database 1,cn=Databases,cn=Monitor
>> subschemaSubentry: cn=Subschema
>> hasSubordinates: TRUE
>>
>> Stays running in the values 3896287 and 3896288. Looks like the memory
>> re-use is being too short causing locks that takes long time causing
>> a non
>> synchronization.
>
>
> What value did you set for "cachefree"?
[Rodrigo] cachefree 100
>
>> PS-> I could not put the file in the openldap ftp. It says device full.
>> Please let me know how can I send this file.
>
> I've let the maintainer of the system know, hopefully there'll be
> space available soon.
[Rodrigo]I'm sending file attached since it is very small 5KB.
>
> --Quanah
>
> --
>
> Quanah Gibson-Mount
> Principal Software Engineer
> Zimbra, Inc
> --------------------
> Zimbra :: the leader in open source messaging and collaboration
>
>
--0-698620411-1251380367=:64533
Content-Type: application/octet-stream; name="replication_too_slow.zip"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="replication_too_slow.zip"
UEsDBBQAAgAIABd5GTtTPer8EQEAAMUBAAAJABUAREJfQ09ORklHVVQJAAOu
KJRKsCiUSlV4BADuAwECXZBbT4NAEIXf+RWT0EcLuysI8qYVmyYKia1Gn8jC
TpFkuQRWUv31LiCofZxvLufMMWG1EwHc3SabOLrfba1UpHQdxfu3aHPRA7UY
MEIcm3j2JQHiBy4LHAZCSQhPDawME/aoQL0jCK54yjuEooISy7r9hIxnutEV
X2gZpmF2qJIRDQQIuMxhvk8I0PMehQGez1Pi+K53NS38KC+qR8nzbpEZq+Gt
w2uUTO8Y//hDvE1ung/xU/gYv4TLOVnn0HP5geOpYUPmSYt5UVclP80O5sYf
RGaWjmYZufaoyyYzmoqiBbvnrV03WEnBG1sHvdZq3SKtsGzqluvYjoVEyFrk
SsuCXsVM6ThnR6psfu/pwvgGUEsDBBQAAgAIAFFzFTsgtwjuGwkAAAlhAAAw
ABUAaXRzX3JlcGxpY2F0aW9uX29wZW5sZGFwX2dkYl90cmFjZXNfMjAwOTA4
MjEudHh0VVQJAAPK2I5KaA2PSlV4BADuAwEC7Vxtb9s4Ev4eIP+B2EWB5Jp4
+SaKCppu79rsXYBNNmjTOxyKhUBJVOKNbHklOZtgr/fbj5RkS7LlWH7LZe8o
N6hNDofkzGieISnNFwThq5/39w5uAu8QZLeJFAEQo1H0CEQUAS/b39vfuy6K
LXBQfjtGFnEYRBaG4ODHf1wBTLDjHB6e7O99CwGADxBSQinEoD8Ernsnk6GM
3Pv0MfU114NDRYcKOksyGjBNJ0dxFLm/iX6mCECYxAPwXdT39J/fS+MeU41w
3ohDGweC6kZpJEaBGwg5iIduINMsiR8L/qTgz7iklpeTZiLJ3HKKsz2MinLd
D1SN6WRw+qMb+1E8lAvHNRURrYuIEttxnEpEBMK1RMQcjK1cROUoXT8eBrmk
3r3764/nf3nv4h7p4WWTKoSnZMcCyDW7KBAjd3Q/EUrFFhzor6fwgQe2z6F/
BAbjTD6UBXZI1eiAukSmTcYdxWn/oeefYNuuBG9jy5F02k1/OO1mFMdKz4my
MpmAgwf9c8oZHuZMdZliyGxcKWM9TVprapI0NKkUyUldk6ihSUIhL4zYdePU
7ddGNU6TSR+Bd0x7tupoolqPQa7uJF83/P77Que4qqCwXlGKtbhqFft7x8fH
148jCd4kMhsnw7cgi4HSX9YfjuURiBPwa6Pq13E/U02mM8X1mTLMHGozPpmp
TYmx2T+KzaKaJjHklo0RpZWD5utpkoeWZdc1+UusfnTWmw9Jm94KJsWPU2xb
FmGWbbOjEoPcwmCVjGG73pBT6U3hARGWmMWDYhq0JLFsPww0yUD0y6qOksix
UYu5+nxBEBvYNLBpXNAmsIl9B/NSk1Hs37mjceYOY/21G4BCLjGSlM8BqAgC
iALMFwEoqlfkghJEhNSS9Rbr4mohPBH4gkG/geGsrPB91hybvRDcDUgbkDYg
vVwS7Z8vDjcwbWDaOKFNYJo50MFOXUO5vNwseWxCdfsYUaUYx29TzAy3iTIE
cyxCebsyCMSVzqVAfph7PxUiuL7wb6UbJWMVTiQ3ShiqMNeAj4ksNJCTKC42
pZVKpcCMoCaXsK9sRi/n45EeEfE86kmu/F8/tx9J7FCon+qXA23KLHIE1o8b
9DRlv+hIBSOcKcsMI3Fzio+AFs20wmrOAikUqQxJOphhbzKNVIrEv50bf5JO
fwo/Z1YQaplANjUsDkPL93OfHd/LJBKPbjxS91B0V4tncipphwup7ClVWMBM
KxUvY0JoYT/XZfo49EdJfJ8rwU+H5SS00EWg5jCIA3n6w/nlB/fq49mns8vr
ylImTfPp6LvWmTAXnt1krsbQkNGU/VREDErayhorFWn7hlNR0cWiQqiLqBDu
IipESpxwwkJUoazPIyehExKH5SRB3Ky3ynpmF3eOMtCh9LO+BprChWgqNqEK
yCyVH/Wl8nppFo9KYnviGQXzN/SMuUEjvoFnRM56nvElB9prL0gMNpoA/eUE
6CYaN9G48Thrb5pRioNSkyrAy0RaBtArbpvZz7DRtDZgGWs0+Pc/vEF1vPpl
oNNAp3FW/33o1BcNYMuJ0/xxTy5izwownKkoRUUYr1c82f1mh1FtB2VseuzV
dhjVdoLGp8d2UqLasd2NzLThqHtDREvF6EzOwOwgbHAv91HmB5pvnegW3A6a
FbgcDw+LgdaOETuPp9hMMUdvJrL540Y2m+w4v6QjOxPemPDGeKwNHqcJfX9d
XEYlLgceRPPhTRsuk2mFcNrCm41weT2XBmpP3wTSEtwAugF0s1WxlWdpDEAb
gDYeaDOA5oywJkDH3i8r7DwgQleA5vzyRPuzrs1F/462F8y62sCwgeGtwLCB
XgO9xutsAL0U1qE3TKRc8cR8ZpXbBXxbXjR5zm1/vQoOlcrmcbll2OW2vy8h
bgB2te2/yTZ7+QQmxIKFJhww4YAJB7a3Ku8QJIBwHEW7ihQuY5A+Drw4Apnw
Im0qYQzEvehH+mdvzVBiOdfuscZyXpsEI8u5rxitLGW4m3Cmsx635Fa7anjX
fleNQwGaiNLeDnwwKC9dBk7BgTLKsZ+1c03Bnw7BlNdm53r6ykR6p/uc7UyX
u1nRGazIf4uTOzfqp1mzTT60kS6ftMllbNda+tmDavN7lI312zKnQHlShJhN
tCfVhXfyUdcrgrvyu2ICg8DBgX5jpmCiKwORiaJWEio8fpSX6mBt2sZmFLzR
d7qbRu5ADiY3+tuvR+D3Oq+qI5sR4R/NdcApgs3uax3ZDPkeeDPzXsRrm+T9
NPhLTBHniyfCLQvOTEQKQnEA3uh3dLRKZJLXPTEHLplDWuaALL5wDqoNgqWw
4tFEUK8xnJsDXDD6mXF/1fY2kiJLgQoFs/5Apm+/fq3s4K4whDmDG6cyUVUt
NtdX5KR2m6gBpVGsLRAzqyq+FemtKrMIUnemU5W7+uWf6+t/uj98vnx/ff7T
pesqsm+eume/2TAuWu61rF15++dZ8iwdz4aP9c6vpTpD/dyqpTOwz63GOoP2
Ksu1jrbRsi5b3pKtLYJFK7vlLbex9Fvey6K1YfcbY6uLx+cOg9a9nUz4tCR8
egFhEMFYiQLzdcMgITiCWwqDipc+juY74GRRCPFUmDITPsyGDW38auEDgbsO
HxCbDx8wpozR/7vwYcMMcVvd0OnuXdfZ8VnFJ+5yS6jFJz61PdQ1GGnbP1re
1pw3mfMms639zDlTEJWi0GQza8p4mEfQOaQNs47JUxQ7zuF8CpaC2Qpq9sLF
GVgmzKYJWCyGQrgwAQuvp05BIbLWSMDCMdnkUU+6ncwtjhKvPYlYNsjAYm0z
AwvrlIHF7pSBhXfKwOLsMgNL8XbPjjKwoE4ZWHCnDCykUwYWujwDi7UkAwvr
lIHFXiUDC992BhZnAz+L4QvIwGLeQDcHyOYAeT6+/5LEcfbOS7IoGCnH+O+f
vwWjFBzL8F83aoVedL+/p4nydTImDlW9OBxqWAKInFhQyTH9DupaCE/yf6De
tvoAcHX+AXz+dPaxWHRfqf8vzwH4+/nHawAUcgDw6W8fwSfw6v3VZ/Dq4uxC
U12fX5y9BuD9TxcXf778AKprfy83glwTRYlytfmoIMEDwKk9UANEA8VQFakB
2j2kvrATiHtKh/nwQOPSg/wPUEsDBBQAAgAIAPV4GTsJpOScnQcAAJoWAAAK
ABUAc2xhcGQuY29uZlVUCQADbiiUSnEolEpVeAQA7gMBAtVYXW8buxF91v4K
Vn6xU1my5CS9NbBFHVu516jjBJFT3DwZ3CUlMeaSG5IrWy3633uGuyvJH7px
UMNAhSDZJWfI+ThzZjY7yQ6bSMm85qXo59ZMd9/ssal1TMjAlfbMGkbLalY5
HhTebEn/+D40L+fKs6nS0J/bSgt28fGSZZLdWIcXJ7ngmZaQTJTJdSVkpzOQ
IR/YUhoteDnw+VwWfJBbJ/v18w8lvTJPlIVgsG5WSueteZqKUf6eIGt/j8p/
LxCjn9IopPd8pszs6hGn/1i1LPh+UemgfvpSLoTDve+svW71kL1jre0NOz89
/rQYsVwraQKl2si8TjCr84s/lNYwl8DElMOAfsKjaqaMuFqM6KxTy4wNTBpK
ODI/lc5xoKcyQWl2/P5y/JktbcXmfCEZJ4BcIwZMKIfbrFviCC/dQuWSHV+c
Mm6gKZC4wI0gQTtdHwo8tc8d8vJoMHDWhn7rdR9JT5JSCUImEr3gbuAqsxGV
CHYIJNzNfJTaJkQCSaLtTMuF1J3o67nlgoml4YXKWcbza2kEK6yotPRHSf1Q
8jDvDCrvBlpl8lauc9IIaBzSIeWrTGR9zR8sF9YoRObuVp1jvzR56eyC9nbY
fcUYgi07WfH4Tsm9vxGP7/m51DpuJZQi6c5O2RAvVP6SGXkbgA0HDtEoOM9q
aFReUs4uzyeRTKTJ3RK8gUxuIAxStEKXiqoolixIDwhKF9RU5TyASOYqn0fg
5IDETBrpaDlbsnzODZURCxbqEfXltRoE7Qd0gO8xZNOQQLfg1y3BlbLo9oAu
Ada6pU2QQ6G8j9ZYQyhs5Zi38IuHCHyKKfnkGJyZOVuV0SBiOKYCKuWrrVxb
Q95Oww13kmLJl0CIvmY4x0s93fdqZqTY9BGWzu0N0OUio55PTo5P1rvviVwf
ejfI+X6GCgG55i40ej/WWjn3QOMfcvljJeoVvCiJ7mVeORWWiIAPTtXpTHY6
n+X3CiXNlAlyFgWA01Cnm+2WDm4iQHP1DcBC+Pc2VIbD0X6mAts9PB1PKMyZ
DEG6vRV2cAJBqSoFRW1D8+1hVLwn51W0lDgqkktjsPfTdNgcchVfhqNG9opk
07evN/zkeQ7eJMwGZzUrrVb58ojuBuGw08n4qME7N0trJMDYYgIykyqryZbt
+irbg+du+SOdj0CbIyFPt3RqiibksBuY3xq03uIVFMCxlEMREerr0x4IGmuW
ha18I4RbN3UheBq5WC1Qw0ZKgeMgEwNTUNJWvrdBwa4w/Yx7mXa7VJKv4s2P
C+QmXYVjm/ArGIGdtbf1+9qp+n3tCTkQiUhN0X3uJQvbwAYg52F9b7N9Na7Q
3RQa3yaCaIHq8E5WsBfmxBRZFVZoR1xbHEZBgAGeMrYr+7N+j3U3fFr72t2L
ttbCkT64vuHLNl+4vE7x+J/jz18vfzu7+PVPQOLz/HAv2syQwWROCaFQKPSX
umyf6Y728A6uSnw1BcWuxhLWtVV68vHicnxx2bOpsJhhTC9Pp66bNBFZiwIs
XBTY366zw0605C5Q94nNyzoBJpW+lLlCWpeRAyjp9em9ZkKlQCDTC6sAcGSs
nX3rBrj7y15MxNZpmCj6S93agARLzWtdRMQ8ICF0Aj7D4dGv8qbTAfc4GZrU
08K/I1H9R317//Uin1zk4d3XWdL003WG2tGIffgyuWTyVqE3lk6RW3bV3KKl
cWQikquHcGvgftZyhaKBDCCkWNTSrYc4x9YufbCYIP9ycAAo5rZAuVNwkrUF
zWhJM1I7xuyToQMCFVl+hiktr4uBkoRYmSYBmB5blzCrCnkb61mJXmlMVWTS
9cCNxS0WaKrFoIUU8wyDKteKe4WK/N6jIm6UbfYNNp1opAs7zWJk1pPJRS8+
fPmCAQV7yc4JB9+wBdcV9Yuc3rz6VzsrDw/wS1arnVF8F+au3OFB/CU7Gxud
182aEvqOdH3k5mrnsJaMC1OaktrLk9VSB2pJQkOdk6VmCE56cDBMGPXOBXDq
0mbKHR70h69H/eGb/uvR0eEvf4VIWJYydXSMnx8b8QlUCZxgA4hzy7T7hg3f
kA/sz10sepRMPq8ZeXttQS7yNP6KXTq10ylyjBkmOJ92X/XiWdQsCxnmVqR1
+2zWhEmfUsCMwVtBlYOJPm1LJCmUc9YVhEeMYxZsrDFCtQMvjaDN4360rrQA
Wx12NhwdrHe9jEMd8FTvIgXsM300bPI2lUTBPaaMZKdeRiTZtmDTASUaB/dU
/neRvUOZs7P4KRzLBFP/IH5GrOqlvmi/FqwVcBZmPx9ifveHfXkbpw4QT0E3
gmLAcBjE0hyNAcKassDuhp57zeivQubz9NfJ5PjTWSNFzJSfiZRuGGyYsHnN
38e/H3/4dD7un3z88Ly9ZvQyvabtGdtaztnF6fj38eSnWs4WnWdoOS/ScNZe
rRrPyzSWQd1P/qduMgJV3i/stwcs/Rs7xteAiS7z6pa+26b4kIMqEM1CFW1+
J9211BJD9rvHe1LTL25kVjeadcuhp1VLutdzQChU2GhRURbf/fLFulKz+vTe
ROtPb0+rU/64Q61te9CnRi/Qp7YU5PP2qa2X/P/0qWHdp56Nxpv/gnpZJm8u
Tf4LUEsBAhcDFAACAAgAF3kZO1M96vwRAQAAxQEAAAkADQAAAAAAAQAAAKSB
AAAAAERCX0NPTkZJR1VUBQADriiUSlV4AABQSwECFwMUAAIACABRcxU7ILcI
7hsJAAAJYQAAMAANAAAAAAABAAAAwIFNAQAAaXRzX3JlcGxpY2F0aW9uX29w
ZW5sZGFwX2dkYl90cmFjZXNfMjAwOTA4MjEudHh0VVQFAAPK2I5KVXgAAFBL
AQIXAxQAAgAIAPV4GTsJpOScnQcAAJoWAAAKAA0AAAAAAAEAAACkgcsKAABz
bGFwZC5jb25mVVQFAANuKJRKVXgAAFBLBQYAAAAAAwADAPQAAAClEgAAAAA=
--0-698620411-1251380367=:64533--
14 years, 3 months
Re: (ITS#6275) syncrepl taking long(not sync) when consumer not connect for a moment
by quanah@zimbra.com
--On Tuesday, August 25, 2009 6:16 PM +0000 rlvcosta(a)yahoo.com wrote:
> The issue I'm facing is related, in a general user view, is when I stop
> the secondary Provider2(master 2) for backup purposes using slapcat. The
> Provider1(master 1) continues to provide ldap service where some
> entrances can be created during the time backup is running(no consumer
> from Provider 2).
Why are you stopping the provider to do a slapcat?
> Even a small number of entrances are different when consumer in Provider 2
> connects to Provider 1 then syncrepl enters in the full DB search as
> expected.
What is your sessionlog setting on each provider for the syncprov overlay?
> For definition purposes I have some memory limitations where I need to
> limit dncachesize for around 80% of DB entrances.
We already went through other things you could do to reduce your memory
footprint in other ways. You've completely ignored that advice. As long
as your dncachesize is in this state, I don't expect things to behave
normally.
> I could also note that when in this situation the monitor cache, in a
> very slow pace, changes the cache in a single entrance. Being more
> specific :
>
> dn: cn=Database 1,cn=Databases,cn=Monitor
> structuralObjectClass: monitoredObject
> creatorsName:
> modifiersName:
> createTimestamp: 20090821145848Z
> modifyTimestamp: 20090821145848Z
> monitoredInfo: bdb
> monitorIsShadow: TRUE
> namingContexts: ou=CONTENT,o=domain,c=fr
> readOnly: FALSE
> monitorOverlay: syncprov
> olmBDBEntryCache: 19920
> olmBDBDNCache: 3896287
> olmBDBIDLCache: 2
> olmDbDirectory: /var/openldap-data/bdb1/
> entryDN: cn=Database 1,cn=Databases,cn=Monitor
> subschemaSubentry: cn=Subschema
> hasSubordinates: TRUE
>
> Stays running in the values 3896287 and 3896288. Looks like the memory
> re-use is being too short causing locks that takes long time causing a non
> synchronization.
What value did you set for "cachefree"?
> PS-> I could not put the file in the openldap ftp. It says device full.
> Please let me know how can I send this file.
I've let the maintainer of the system know, hopefully there'll be space
available soon.
--Quanah
--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
14 years, 3 months
Re: (ITS#6276) paused pool can deadlock if writers are waiting
by h.b.furuseth@usit.uio.no
h.b.furuseth(a)usit.uio.no writes:
> Then we could add the option (c) a *small* set of (groups of?) config
> variables which each will need their own mutex or pause or something.
> Such as loglevel, since the code is doing Debug() all over the place.
I forgot: Simplest way here is to have two versions of the variable:
the one accessed by cn=config, and a copy which is the one slapd obeys.
The cn=config variable is pause-protected as usual, and slapd copies it
to the active variable at its leisure. This will be acceptable for
config changes that cannot fail once the value has been verified and
need not take effect immediately. Then the copy task needs locks both
for the configurable variable ("lock" = prevent_pause() ... "unlock" =
allow_pause()?) and for the active variable, but not at the same time.
--
Hallvard
14 years, 3 months
Re: (ITS#6276) paused pool can deadlock if writers are waiting
by hyc@symas.com
Hallvard B Furuseth wrote:
> But I quite agree
> that this is hopeless if that "small" set cannot be kept small. For one
> thing it might introduce yet another "lock order", threatening to be
> inconsistent with the lock order of other mutexes.
>> Another possibility is to just try to read 1 byte in the listener
>> thread, to detect the hangup there when we have no other means to
>> discover it. We would have to make sure to be able to unget this byte
>> back to the bottom of the sockbuf stack if there's valid data. This
>> will affect the throughput of the listener thread, but it may not be
>> too terrible a hit.
>
> Does that help for a socket which has gotten blocked on select() due to
> full socket buffers?
A socket that's blocked on a write will eventually clear itself up - either
the client will catch up or it will go away. So there's really no reason to
worry about them at all.
> Also if doing - a read() OS call anyway, why not read() a large enough
> chunk that it would commonly be a full PDU? As long as it's read to a
> sockbuf buffer rather than into slapd's data.
Because we don't want to keep the listener thread busy for any longer than
necessary. In terms of system call overhead it might be reasonable to read as
much as one memory page (e.g. 4KB) but even that involves a lot of cycles for
memcpy, and the idea is to shunt all the time-consuming work elsewhere so that
the listener can return to listening ASAP.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
14 years, 3 months
Re: (ITS#6276) paused pool can deadlock if writers are waiting
by h.b.furuseth@usit.uio.no
Howard Chu writes:
>Hallvard B Furuseth wrote:
>> Beyond that, my immediate reaction was that pauses are implemented at
>> the wrong level and/or need to be split up in different types of pauses.
>> There is no good reason cn=config's need to have slapd config variables
>> for itself to affect network operations - nor, I hope, affect pool-level
>> operations like pool_purgekey().
>
> I suppose implementing things this way can appear to be too blunt. But the
> amount of locks required to do it at a finer level is, IMO, unmanageable.
I know. That's why I'm not suggesting to do away with pauses, only to
make them a little less crude than "1 type of pause for everything".
See below.
> configurations are changed from under them.)
>
>> E.g. cn=config could use a slapd_config_pause() call to ask for lone
>> access to the config variables instead of thread_pool_pause() call.
>> Slapd operations that must respect such pauses should thus call a
>> slapd_config_pausecheck() macro/function, not thread_pool_pausecheck()
>> or depend on the pool to respect slapd-level pauses.
>> connection_read_thread() can then go ahead and use the connection
>> independently of those pauses. If it reaches connection_operation(),
>> it'll need to check for slapd pause.
>
> Yes, I've thought about this approach too. That makes the pause mechanisms
> even messier.
It does? I was hoping that would be the cleaner solution.
Lifting them from pool level to slapd level would mean that every task
now submitted to the pool will be responsible for (a) calling pausecheck
itself or (b) avoiding all variables/features that today need pauses.
Then we could add the option (c) a *small* set of (groups of?) config
variables which each will need their own mutex or pause or something.
Such as loglevel, since the code is doing Debug() all over the place.
And presumably some network variables, for this ITS. But I quite agree
that this is hopeless if that "small" set cannot be kept small. For one
thing it might introduce yet another "lock order", threatening to be
inconsistent with the lock order of other mutexes.
> Another possibility is to just try to read 1 byte in the listener
> thread, to detect the hangup there when we have no other means to
> discover it. We would have to make sure to be able to unget this byte
> back to the bottom of the sockbuf stack if there's valid data. This
> will affect the throughput of the listener thread, but it may not be
> too terrible a hit.
Does that help for a socket which has gotten blocked on select() due to
full socket buffers?
Also if doing - a read() OS call anyway, why not read() a large enough
chunk that it would commonly be a full PDU? As long as it's read to a
sockbuf buffer rather than into slapd's data.
--
Hallvard
14 years, 3 months
Re: (ITS#6257) libldap: getopt flag to return the SASL username
by hyc@symas.com
masarati(a)aero.polimi.it wrote:
> My concern was not from an operational point of view: the simple concept
> of having a library dynamically loading something that could no longer be
> present is calling for trouble, unless handled appropriately, and probably
> there is no way to do it safely as one could always remove a .so while
> it's in use (although I guess on any decent system the object will be
> cached or loaded somewhere until it's in use).
>
> My concern is about the char* array returned by that call: if for any
> reason the library decides to refresh it, but the caller of
> ldap_get_option() is still holding a pointer to that array, this calls for
> pointing to freed memory and things like that, as far as I understand.
> For this reason, returning a copy sounds wiser. Whether the contents of
> that copy is valid or not, namely the related mechanism is available or
> not, that's an entirely different issue.
The Cyrus mechlist is only initialized once during the life of the library.
(Subsequent init calls just increment an init counter and then return.) It
will not change in normal use. If an application loads / unloads / reloads the
library, the list may change, but any app that goes thru this trouble will
already know they have to call the init functions all over again, and re-fetch
this list.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
14 years, 3 months
Re: (ITS#6257) libldap: getopt flag to return the SASL username
by masarati@aero.polimi.it
>> On a somewhat related issue, I note that LDAP_OPT_X_SASL_MECHLIST
>> returns
>> a pointer to an array of chars that apparently cannot be mucked with.
>>
>> Assuming my understanding is correct, I wonder if this behavior is
>> desirable or not, given the fact that if another mech is added, e.g. by
>> adding a dynamic module, I expect this list to change.
>
> These are SASL mechs with the plugin modules. Right?
>
>>From an operational standpoint: If a SASL plugin module for a mech was
>> added I
> think it's acceptable that a software which queries this option is
> restarted
> before this SASL mech is known to the software. Probably one has to add
> additional configuration for this SASL mech.
>
> Now the question is what happens if a SASL plugin module is removed and
> the
> software trys to use the removed SASL mech. Clearly removing plugin
> modules in
> a running system is asking for trouble anyway...
>
> Having said this I would not care too much about this list going to
> change...
My concern was not from an operational point of view: the simple concept
of having a library dynamically loading something that could no longer be
present is calling for trouble, unless handled appropriately, and probably
there is no way to do it safely as one could always remove a .so while
it's in use (although I guess on any decent system the object will be
cached or loaded somewhere until it's in use).
My concern is about the char* array returned by that call: if for any
reason the library decides to refresh it, but the caller of
ldap_get_option() is still holding a pointer to that array, this calls for
pointing to freed memory and things like that, as far as I understand.
For this reason, returning a copy sounds wiser. Whether the contents of
that copy is valid or not, namely the related mechanism is available or
not, that's an entirely different issue.
p.
14 years, 3 months
Re: (ITS#6276) paused pool can deadlock if writers are waiting
by hyc@symas.com
Hallvard B Furuseth wrote:
> This is a quick abstract answer, I haven't dived into the code to see
> what my suggestions would mean in practice. Anyway:
> It sounds to me like this problem cannot be fully prevented when network
> operations share the same pool queue as everything else, and we have
> operations that can wait for other operations. All pool threads can be
> occupied with LDAP-level operations, so that network-level operations
> which the LDAP-level operations depend on can get blocked. If slapd has
> a design which even tries to guarantee forward progress, I'm not aware
> of it.
>
> So LDAP-level operations ought to leave at least one thread free for for
> network-level operations. Not necessary a designated thread: A thread
> moving from network-level to LDAP-level operation like
> connection_read_thread() does could first check that at least one other
> thread remains available for network-level operation.
Yes...
> Beyond that, my immediate reaction was that pauses are implemented at
> the wrong level and/or need to be split up in different types of pauses.
> There is no good reason cn=config's need to have slapd config variables
> for itself to affect network operations - nor, I hope, affect pool-level
> operations like pool_purgekey().
I suppose implementing things this way can appear to be too blunt. But the
amount of locks required to do it at a finer level is, IMO, unmanageable. The
fact that you can change global slapd config parameters (such as the number of
threads, size of sockbuf buffers, etc.) makes it inherently safe for
*anything* else to be active while config changes are being made. And sifting
thru each variable to decide how sensitive they are, and when they are unsafe,
will inevitably lead to requiring locks on every single piece of configuration
data. (Want to look up an attribute type, or objectclass? Or a database
suffix? etc. etc. etc... There are countless things we do arbitrarily that
simply won't work if we allow arbitrary threads to continue to run while their
configurations are changed from under them.)
> E.g. cn=config could use a slapd_config_pause() call to ask for lone
> access to the config variables instead of thread_pool_pause() call.
> Slapd operations that must respect such pauses should thus call a
> slapd_config_pausecheck() macro/function, not thread_pool_pausecheck()
> or depend on the pool to respect slapd-level pauses.
> connection_read_thread() can then go ahead and use the connection
> independently of those pauses. If it reaches connection_operation(),
> it'll need to check for slapd pause.
Yes, I've thought about this approach too. That makes the pause mechanisms
even messier.
Another possibility is to just try to read 1 byte in the listener thread, to
detect the hangup there when we have no other means to discover it. We would
have to make sure to be able to unget this byte back to the bottom of the
sockbuf stack if there's valid data. This will affect the throughput of the
listener thread, but it may not be too terrible a hit.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
14 years, 3 months
Re: (ITS#6276) paused pool can deadlock if writers are waiting
by h.b.furuseth@usit.uio.no
This is a quick abstract answer, I haven't dived into the code to see
what my suggestions would mean in practice. Anyway:
It sounds to me like this problem cannot be fully prevented when network
operations share the same pool queue as everything else, and we have
operations that can wait for other operations. All pool threads can be
occupied with LDAP-level operations, so that network-level operations
which the LDAP-level operations depend on can get blocked. If slapd has
a design which even tries to guarantee forward progress, I'm not aware
of it.
So LDAP-level operations ought to leave at least one thread free for for
network-level operations. Not necessary a designated thread: A thread
moving from network-level to LDAP-level operation like
connection_read_thread() does could first check that at least one other
thread remains available for network-level operation.
Beyond that, my immediate reaction was that pauses are implemented at
the wrong level and/or need to be split up in different types of pauses.
There is no good reason cn=config's need to have slapd config variables
for itself to affect network operations - nor, I hope, affect pool-level
operations like pool_purgekey().
E.g. cn=config could use a slapd_config_pause() call to ask for lone
access to the config variables instead of thread_pool_pause() call.
Slapd operations that must respect such pauses should thus call a
slapd_config_pausecheck() macro/function, not thread_pool_pausecheck()
or depend on the pool to respect slapd-level pauses.
connection_read_thread() can then go ahead and use the connection
independently of those pauses. If it reaches connection_operation(),
it'll need to check for slapd pause.
Alternatively, with different types of pauses/operations: The pool could
have different sets of pauses (a pause-type bitmask?) and if one type of
pauses happens, operations that ignore that type of pause will not be
affected. This messes up the pool code further though, and I already
hate what pauses do to the otherwise clean pool code:-(
--
Hallvard
14 years, 3 months