openldap-bugs August 2009

openldap-bugs@openldap.org

35 participants
225 discussions

Re: (ITS#6276) paused pool can deadlock if writers are waiting
by h.b.furuseth＠usit.uio.no 27 Aug '09

27 Aug '09

Re-reading this, I'm not sure what either of us are talking about... About reading a byte and putting it back if select() says the socket is readable: I suppose this is only necessary if conn->c_writewaiter != 0, and maybe only if the pool is paused. And remember that the next select() will often also find the socket is readable, so we must read the next byte (from the socket, not the sockbuf) and put that on the sockbut stack, and so on. About multiple types of pauses: The concept must be rephrased for that to make sense. The pause mechanism is in effect a pool-wide read/write lock which favors writers over readers. A pool thread readlocks it while running a task. pool_pause() does unlock - writelock. pool_resume() and pool_pausecheck() do unlock - readlock. Now it's possible to talk about moving a few things out of this particular lock, and if necessary protect it with another lock. -- Hallvard

1 0

Re: (ITS#6275) syncrepl taking long(not sync) when consumer not connect for a moment
by rlvcosta＠yahoo.com 27 Aug '09

27 Aug '09

--0-698620411-1251380367=:64533 Content-Type: text/plain; charset=us-ascii Quanah, Please see answer in your previous e-mail below. I'm also sending the information I could collect attached since it is a small file(5KB). The behavior that appears strange and that could indicate a problem is the fact that even when consumer is stopped the provider still doing something for a long time. This doesn't appear to be correct. Other strange behavior is that when system enters in this state one provider CPU stays running around 100% CPU usage. I made a jmeter script to test individual bind/search(no ldapsearch *) and then even with some load(like 200 simultaneous query) I do not see CPU in 100%. Something doesn't appear to be ok since I do not see why CPU should enter in 100% permanently. Regards, Rodrigo. Quanah Gibson-Mount wrote: > --On Tuesday, August 25, 2009 6:16 PM +0000 rlvcosta(a)yahoo.com wrote: > >> The issue I'm facing is related, in a general user view, is when I stop >> the secondary Provider2(master 2) for backup purposes using slapcat. The >> Provider1(master 1) continues to provide ldap service where some >> entrances can be created during the time backup is running(no consumer >> from Provider 2). > > Why are you stopping the provider to do a slapcat? [Rodrigo]Faster dump of data. And in any case if other situation like a problema occurs the secondary system could stay disconnect for other reasons. > >> Even a small number of entrances are different when consumer in >> Provider 2 >> connects to Provider 1 then syncrepl enters in the full DB search as >> expected. > > > What is your sessionlog setting on each provider for the syncprov > overlay? [Rodrigo] syncprov-checkpoint 10000 120 syncprov-sessionlog 100000 Same configuration in both systems. > >> For definition purposes I have some memory limitations where I need to >> limit dncachesize for around 80% of DB entrances. > > We already went through other things you could do to reduce your > memory footprint in other ways. You've completely ignored that > advice. As long as your dncachesize is in this state, I don't expect > things to behave normally. [Rodrigo]I implemented what was possible. The end is this cache config possible by the memory constraints : #Cache values #cachesize 10000 cachesize 20000 dncachesize 3000000 #dncachesize 400000 #idlcachesize 10000 idlcachesize 30000 #cachefree 10 cachefree 100 > >> I could also note that when in this situation the monitor cache, in a >> very slow pace, changes the cache in a single entrance. Being more >> specific : >> >> dn: cn=Database 1,cn=Databases,cn=Monitor >> structuralObjectClass: monitoredObject >> creatorsName: >> modifiersName: >> createTimestamp: 20090821145848Z >> modifyTimestamp: 20090821145848Z >> monitoredInfo: bdb >> monitorIsShadow: TRUE >> namingContexts: ou=CONTENT,o=domain,c=fr >> readOnly: FALSE >> monitorOverlay: syncprov >> olmBDBEntryCache: 19920 >> olmBDBDNCache: 3896287 >> olmBDBIDLCache: 2 >> olmDbDirectory: /var/openldap-data/bdb1/ >> entryDN: cn=Database 1,cn=Databases,cn=Monitor >> subschemaSubentry: cn=Subschema >> hasSubordinates: TRUE >> >> Stays running in the values 3896287 and 3896288. Looks like the memory >> re-use is being too short causing locks that takes long time causing >> a non >> synchronization. > > > What value did you set for "cachefree"? [Rodrigo] cachefree 100 > >> PS-> I could not put the file in the openldap ftp. It says device full. >> Please let me know how can I send this file. > > I've let the maintainer of the system know, hopefully there'll be > space available soon. [Rodrigo]I'm sending file attached since it is very small 5KB. > > --Quanah > > -- > > Quanah Gibson-Mount > Principal Software Engineer > Zimbra, Inc > -------------------- > Zimbra :: the leader in open source messaging and collaboration > > --0-698620411-1251380367=:64533 Content-Type: application/octet-stream; name="replication_too_slow.zip" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="replication_too_slow.zip" UEsDBBQAAgAIABd5GTtTPer8EQEAAMUBAAAJABUAREJfQ09ORklHVVQJAAOu KJRKsCiUSlV4BADuAwECXZBbT4NAEIXf+RWT0EcLuysI8qYVmyYKia1Gn8jC TpFkuQRWUv31LiCofZxvLufMMWG1EwHc3SabOLrfba1UpHQdxfu3aHPRA7UY MEIcm3j2JQHiBy4LHAZCSQhPDawME/aoQL0jCK54yjuEooISy7r9hIxnutEV X2gZpmF2qJIRDQQIuMxhvk8I0PMehQGez1Pi+K53NS38KC+qR8nzbpEZq+Gt w2uUTO8Y//hDvE1ung/xU/gYv4TLOVnn0HP5geOpYUPmSYt5UVclP80O5sYf RGaWjmYZufaoyyYzmoqiBbvnrV03WEnBG1sHvdZq3SKtsGzqluvYjoVEyFrk SsuCXsVM6ThnR6psfu/pwvgGUEsDBBQAAgAIAFFzFTsgtwjuGwkAAAlhAAAw ABUAaXRzX3JlcGxpY2F0aW9uX29wZW5sZGFwX2dkYl90cmFjZXNfMjAwOTA4 MjEudHh0VVQJAAPK2I5KaA2PSlV4BADuAwEC7Vxtb9s4Ev4eIP+B2EWB5Jp4 +SaKCppu79rsXYBNNmjTOxyKhUBJVOKNbHklOZtgr/fbj5RkS7LlWH7LZe8o N6hNDofkzGieISnNFwThq5/39w5uAu8QZLeJFAEQo1H0CEQUAS/b39vfuy6K LXBQfjtGFnEYRBaG4ODHf1wBTLDjHB6e7O99CwGADxBSQinEoD8Ernsnk6GM 3Pv0MfU114NDRYcKOksyGjBNJ0dxFLm/iX6mCECYxAPwXdT39J/fS+MeU41w 3ohDGweC6kZpJEaBGwg5iIduINMsiR8L/qTgz7iklpeTZiLJ3HKKsz2MinLd D1SN6WRw+qMb+1E8lAvHNRURrYuIEttxnEpEBMK1RMQcjK1cROUoXT8eBrmk 3r3764/nf3nv4h7p4WWTKoSnZMcCyDW7KBAjd3Q/EUrFFhzor6fwgQe2z6F/ BAbjTD6UBXZI1eiAukSmTcYdxWn/oeefYNuuBG9jy5F02k1/OO1mFMdKz4my MpmAgwf9c8oZHuZMdZliyGxcKWM9TVprapI0NKkUyUldk6ihSUIhL4zYdePU 7ddGNU6TSR+Bd0x7tupoolqPQa7uJF83/P77Que4qqCwXlGKtbhqFft7x8fH 148jCd4kMhsnw7cgi4HSX9YfjuURiBPwa6Pq13E/U02mM8X1mTLMHGozPpmp TYmx2T+KzaKaJjHklo0RpZWD5utpkoeWZdc1+UusfnTWmw9Jm94KJsWPU2xb FmGWbbOjEoPcwmCVjGG73pBT6U3hARGWmMWDYhq0JLFsPww0yUD0y6qOksix UYu5+nxBEBvYNLBpXNAmsIl9B/NSk1Hs37mjceYOY/21G4BCLjGSlM8BqAgC iALMFwEoqlfkghJEhNSS9Rbr4mohPBH4gkG/geGsrPB91hybvRDcDUgbkDYg vVwS7Z8vDjcwbWDaOKFNYJo50MFOXUO5vNwseWxCdfsYUaUYx29TzAy3iTIE cyxCebsyCMSVzqVAfph7PxUiuL7wb6UbJWMVTiQ3ShiqMNeAj4ksNJCTKC42 pZVKpcCMoCaXsK9sRi/n45EeEfE86kmu/F8/tx9J7FCon+qXA23KLHIE1o8b 9DRlv+hIBSOcKcsMI3Fzio+AFs20wmrOAikUqQxJOphhbzKNVIrEv50bf5JO fwo/Z1YQaplANjUsDkPL93OfHd/LJBKPbjxS91B0V4tncipphwup7ClVWMBM KxUvY0JoYT/XZfo49EdJfJ8rwU+H5SS00EWg5jCIA3n6w/nlB/fq49mns8vr ylImTfPp6LvWmTAXnt1krsbQkNGU/VREDErayhorFWn7hlNR0cWiQqiLqBDu IipESpxwwkJUoazPIyehExKH5SRB3Ky3ynpmF3eOMtCh9LO+BprChWgqNqEK yCyVH/Wl8nppFo9KYnviGQXzN/SMuUEjvoFnRM56nvElB9prL0gMNpoA/eUE 6CYaN9G48Thrb5pRioNSkyrAy0RaBtArbpvZz7DRtDZgGWs0+Pc/vEF1vPpl oNNAp3FW/33o1BcNYMuJ0/xxTy5izwownKkoRUUYr1c82f1mh1FtB2VseuzV dhjVdoLGp8d2UqLasd2NzLThqHtDREvF6EzOwOwgbHAv91HmB5pvnegW3A6a FbgcDw+LgdaOETuPp9hMMUdvJrL540Y2m+w4v6QjOxPemPDGeKwNHqcJfX9d XEYlLgceRPPhTRsuk2mFcNrCm41weT2XBmpP3wTSEtwAugF0s1WxlWdpDEAb gDYeaDOA5oywJkDH3i8r7DwgQleA5vzyRPuzrs1F/462F8y62sCwgeGtwLCB XgO9xutsAL0U1qE3TKRc8cR8ZpXbBXxbXjR5zm1/vQoOlcrmcbll2OW2vy8h bgB2te2/yTZ7+QQmxIKFJhww4YAJB7a3Ku8QJIBwHEW7ihQuY5A+Drw4Apnw Im0qYQzEvehH+mdvzVBiOdfuscZyXpsEI8u5rxitLGW4m3Cmsx635Fa7anjX fleNQwGaiNLeDnwwKC9dBk7BgTLKsZ+1c03Bnw7BlNdm53r6ykR6p/uc7UyX u1nRGazIf4uTOzfqp1mzTT60kS6ftMllbNda+tmDavN7lI312zKnQHlShJhN tCfVhXfyUdcrgrvyu2ICg8DBgX5jpmCiKwORiaJWEio8fpSX6mBt2sZmFLzR d7qbRu5ADiY3+tuvR+D3Oq+qI5sR4R/NdcApgs3uax3ZDPkeeDPzXsRrm+T9 NPhLTBHniyfCLQvOTEQKQnEA3uh3dLRKZJLXPTEHLplDWuaALL5wDqoNgqWw 4tFEUK8xnJsDXDD6mXF/1fY2kiJLgQoFs/5Apm+/fq3s4K4whDmDG6cyUVUt NtdX5KR2m6gBpVGsLRAzqyq+FemtKrMIUnemU5W7+uWf6+t/uj98vnx/ff7T pesqsm+eume/2TAuWu61rF15++dZ8iwdz4aP9c6vpTpD/dyqpTOwz63GOoP2 Ksu1jrbRsi5b3pKtLYJFK7vlLbex9Fvey6K1YfcbY6uLx+cOg9a9nUz4tCR8 egFhEMFYiQLzdcMgITiCWwqDipc+juY74GRRCPFUmDITPsyGDW38auEDgbsO HxCbDx8wpozR/7vwYcMMcVvd0OnuXdfZ8VnFJ+5yS6jFJz61PdQ1GGnbP1re 1pw3mfMms639zDlTEJWi0GQza8p4mEfQOaQNs47JUxQ7zuF8CpaC2Qpq9sLF GVgmzKYJWCyGQrgwAQuvp05BIbLWSMDCMdnkUU+6ncwtjhKvPYlYNsjAYm0z AwvrlIHF7pSBhXfKwOLsMgNL8XbPjjKwoE4ZWHCnDCykUwYWujwDi7UkAwvr lIHFXiUDC992BhZnAz+L4QvIwGLeQDcHyOYAeT6+/5LEcfbOS7IoGCnH+O+f vwWjFBzL8F83aoVedL+/p4nydTImDlW9OBxqWAKInFhQyTH9DupaCE/yf6De tvoAcHX+AXz+dPaxWHRfqf8vzwH4+/nHawAUcgDw6W8fwSfw6v3VZ/Dq4uxC U12fX5y9BuD9TxcXf778AKprfy83glwTRYlytfmoIMEDwKk9UANEA8VQFakB 2j2kvrATiHtKh/nwQOPSg/wPUEsDBBQAAgAIAPV4GTsJpOScnQcAAJoWAAAK ABUAc2xhcGQuY29uZlVUCQADbiiUSnEolEpVeAQA7gMBAtVYXW8buxF91v4K Vn6xU1my5CS9NbBFHVu516jjBJFT3DwZ3CUlMeaSG5IrWy3633uGuyvJH7px UMNAhSDZJWfI+ThzZjY7yQ6bSMm85qXo59ZMd9/ssal1TMjAlfbMGkbLalY5 HhTebEn/+D40L+fKs6nS0J/bSgt28fGSZZLdWIcXJ7ngmZaQTJTJdSVkpzOQ IR/YUhoteDnw+VwWfJBbJ/v18w8lvTJPlIVgsG5WSueteZqKUf6eIGt/j8p/ LxCjn9IopPd8pszs6hGn/1i1LPh+UemgfvpSLoTDve+svW71kL1jre0NOz89 /rQYsVwraQKl2si8TjCr84s/lNYwl8DElMOAfsKjaqaMuFqM6KxTy4wNTBpK ODI/lc5xoKcyQWl2/P5y/JktbcXmfCEZJ4BcIwZMKIfbrFviCC/dQuWSHV+c Mm6gKZC4wI0gQTtdHwo8tc8d8vJoMHDWhn7rdR9JT5JSCUImEr3gbuAqsxGV CHYIJNzNfJTaJkQCSaLtTMuF1J3o67nlgoml4YXKWcbza2kEK6yotPRHSf1Q 8jDvDCrvBlpl8lauc9IIaBzSIeWrTGR9zR8sF9YoRObuVp1jvzR56eyC9nbY fcUYgi07WfH4Tsm9vxGP7/m51DpuJZQi6c5O2RAvVP6SGXkbgA0HDtEoOM9q aFReUs4uzyeRTKTJ3RK8gUxuIAxStEKXiqoolixIDwhKF9RU5TyASOYqn0fg 5IDETBrpaDlbsnzODZURCxbqEfXltRoE7Qd0gO8xZNOQQLfg1y3BlbLo9oAu Ada6pU2QQ6G8j9ZYQyhs5Zi38IuHCHyKKfnkGJyZOVuV0SBiOKYCKuWrrVxb Q95Oww13kmLJl0CIvmY4x0s93fdqZqTY9BGWzu0N0OUio55PTo5P1rvviVwf ejfI+X6GCgG55i40ej/WWjn3QOMfcvljJeoVvCiJ7mVeORWWiIAPTtXpTHY6 n+X3CiXNlAlyFgWA01Cnm+2WDm4iQHP1DcBC+Pc2VIbD0X6mAts9PB1PKMyZ DEG6vRV2cAJBqSoFRW1D8+1hVLwn51W0lDgqkktjsPfTdNgcchVfhqNG9opk 07evN/zkeQ7eJMwGZzUrrVb58ojuBuGw08n4qME7N0trJMDYYgIykyqryZbt +irbg+du+SOdj0CbIyFPt3RqiibksBuY3xq03uIVFMCxlEMREerr0x4IGmuW ha18I4RbN3UheBq5WC1Qw0ZKgeMgEwNTUNJWvrdBwa4w/Yx7mXa7VJKv4s2P C+QmXYVjm/ArGIGdtbf1+9qp+n3tCTkQiUhN0X3uJQvbwAYg52F9b7N9Na7Q 3RQa3yaCaIHq8E5WsBfmxBRZFVZoR1xbHEZBgAGeMrYr+7N+j3U3fFr72t2L ttbCkT64vuHLNl+4vE7x+J/jz18vfzu7+PVPQOLz/HAv2syQwWROCaFQKPSX umyf6Y728A6uSnw1BcWuxhLWtVV68vHicnxx2bOpsJhhTC9Pp66bNBFZiwIs XBTY366zw0605C5Q94nNyzoBJpW+lLlCWpeRAyjp9em9ZkKlQCDTC6sAcGSs nX3rBrj7y15MxNZpmCj6S93agARLzWtdRMQ8ICF0Aj7D4dGv8qbTAfc4GZrU 08K/I1H9R317//Uin1zk4d3XWdL003WG2tGIffgyuWTyVqE3lk6RW3bV3KKl cWQikquHcGvgftZyhaKBDCCkWNTSrYc4x9YufbCYIP9ycAAo5rZAuVNwkrUF zWhJM1I7xuyToQMCFVl+hiktr4uBkoRYmSYBmB5blzCrCnkb61mJXmlMVWTS 9cCNxS0WaKrFoIUU8wyDKteKe4WK/N6jIm6UbfYNNp1opAs7zWJk1pPJRS8+ fPmCAQV7yc4JB9+wBdcV9Yuc3rz6VzsrDw/wS1arnVF8F+au3OFB/CU7Gxud 182aEvqOdH3k5mrnsJaMC1OaktrLk9VSB2pJQkOdk6VmCE56cDBMGPXOBXDq 0mbKHR70h69H/eGb/uvR0eEvf4VIWJYydXSMnx8b8QlUCZxgA4hzy7T7hg3f kA/sz10sepRMPq8ZeXttQS7yNP6KXTq10ylyjBkmOJ92X/XiWdQsCxnmVqR1 +2zWhEmfUsCMwVtBlYOJPm1LJCmUc9YVhEeMYxZsrDFCtQMvjaDN4360rrQA Wx12NhwdrHe9jEMd8FTvIgXsM300bPI2lUTBPaaMZKdeRiTZtmDTASUaB/dU /neRvUOZs7P4KRzLBFP/IH5GrOqlvmi/FqwVcBZmPx9ifveHfXkbpw4QT0E3 gmLAcBjE0hyNAcKassDuhp57zeivQubz9NfJ5PjTWSNFzJSfiZRuGGyYsHnN 38e/H3/4dD7un3z88Ly9ZvQyvabtGdtaztnF6fj38eSnWs4WnWdoOS/ScNZe rRrPyzSWQd1P/qduMgJV3i/stwcs/Rs7xteAiS7z6pa+26b4kIMqEM1CFW1+ J9211BJD9rvHe1LTL25kVjeadcuhp1VLutdzQChU2GhRURbf/fLFulKz+vTe ROtPb0+rU/64Q61te9CnRi/Qp7YU5PP2qa2X/P/0qWHdp56Nxpv/gnpZJm8u Tf4LUEsBAhcDFAACAAgAF3kZO1M96vwRAQAAxQEAAAkADQAAAAAAAQAAAKSB AAAAAERCX0NPTkZJR1VUBQADriiUSlV4AABQSwECFwMUAAIACABRcxU7ILcI 7hsJAAAJYQAAMAANAAAAAAABAAAAwIFNAQAAaXRzX3JlcGxpY2F0aW9uX29w ZW5sZGFwX2dkYl90cmFjZXNfMjAwOTA4MjEudHh0VVQFAAPK2I5KVXgAAFBL AQIXAxQAAgAIAPV4GTsJpOScnQcAAJoWAAAKAA0AAAAAAAEAAACkgcsKAABz bGFwZC5jb25mVVQFAANuKJRKVXgAAFBLBQYAAAAAAwADAPQAAAClEgAAAAA= --0-698620411-1251380367=:64533--

1 0

Re: (ITS#6275) syncrepl taking long(not sync) when consumer not connect for a moment
by quanah＠zimbra.com 26 Aug '09

26 Aug '09

--On Tuesday, August 25, 2009 6:16 PM +0000 rlvcosta(a)yahoo.com wrote: > The issue I'm facing is related, in a general user view, is when I stop > the secondary Provider2(master 2) for backup purposes using slapcat. The > Provider1(master 1) continues to provide ldap service where some > entrances can be created during the time backup is running(no consumer > from Provider 2). Why are you stopping the provider to do a slapcat? > Even a small number of entrances are different when consumer in Provider 2 > connects to Provider 1 then syncrepl enters in the full DB search as > expected. What is your sessionlog setting on each provider for the syncprov overlay? > For definition purposes I have some memory limitations where I need to > limit dncachesize for around 80% of DB entrances. We already went through other things you could do to reduce your memory footprint in other ways. You've completely ignored that advice. As long as your dncachesize is in this state, I don't expect things to behave normally. > I could also note that when in this situation the monitor cache, in a > very slow pace, changes the cache in a single entrance. Being more > specific : > > dn: cn=Database 1,cn=Databases,cn=Monitor > structuralObjectClass: monitoredObject > creatorsName: > modifiersName: > createTimestamp: 20090821145848Z > modifyTimestamp: 20090821145848Z > monitoredInfo: bdb > monitorIsShadow: TRUE > namingContexts: ou=CONTENT,o=domain,c=fr > readOnly: FALSE > monitorOverlay: syncprov > olmBDBEntryCache: 19920 > olmBDBDNCache: 3896287 > olmBDBIDLCache: 2 > olmDbDirectory: /var/openldap-data/bdb1/ > entryDN: cn=Database 1,cn=Databases,cn=Monitor > subschemaSubentry: cn=Subschema > hasSubordinates: TRUE > > Stays running in the values 3896287 and 3896288. Looks like the memory > re-use is being too short causing locks that takes long time causing a non > synchronization. What value did you set for "cachefree"? > PS-> I could not put the file in the openldap ftp. It says device full. > Please let me know how can I send this file. I've let the maintainer of the system know, hopefully there'll be space available soon. --Quanah -- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration

1 0

Re: (ITS#6276) paused pool can deadlock if writers are waiting
by h.b.furuseth＠usit.uio.no 26 Aug '09

26 Aug '09

h.b.furuseth(a)usit.uio.no writes: > Then we could add the option (c) a *small* set of (groups of?) config > variables which each will need their own mutex or pause or something. > Such as loglevel, since the code is doing Debug() all over the place. I forgot: Simplest way here is to have two versions of the variable: the one accessed by cn=config, and a copy which is the one slapd obeys. The cn=config variable is pause-protected as usual, and slapd copies it to the active variable at its leisure. This will be acceptable for config changes that cannot fail once the value has been verified and need not take effect immediately. Then the copy task needs locks both for the configurable variable ("lock" = prevent_pause() ... "unlock" = allow_pause()?) and for the active variable, but not at the same time. -- Hallvard

1 0

Re: (ITS#6276) paused pool can deadlock if writers are waiting
by hyc＠symas.com 26 Aug '09

26 Aug '09

Hallvard B Furuseth wrote: > But I quite agree > that this is hopeless if that "small" set cannot be kept small. For one > thing it might introduce yet another "lock order", threatening to be > inconsistent with the lock order of other mutexes. >> Another possibility is to just try to read 1 byte in the listener >> thread, to detect the hangup there when we have no other means to >> discover it. We would have to make sure to be able to unget this byte >> back to the bottom of the sockbuf stack if there's valid data. This >> will affect the throughput of the listener thread, but it may not be >> too terrible a hit. > > Does that help for a socket which has gotten blocked on select() due to > full socket buffers? A socket that's blocked on a write will eventually clear itself up - either the client will catch up or it will go away. So there's really no reason to worry about them at all. > Also if doing - a read() OS call anyway, why not read() a large enough > chunk that it would commonly be a full PDU? As long as it's read to a > sockbuf buffer rather than into slapd's data. Because we don't want to keep the listener thread busy for any longer than necessary. In terms of system call overhead it might be reasonable to read as much as one memory page (e.g. 4KB) but even that involves a lot of cycles for memcpy, and the idea is to shunt all the time-consuming work elsewhere so that the listener can return to listening ASAP. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#6276) paused pool can deadlock if writers are waiting
by h.b.furuseth＠usit.uio.no 26 Aug '09

26 Aug '09

Howard Chu writes: >Hallvard B Furuseth wrote: >> Beyond that, my immediate reaction was that pauses are implemented at >> the wrong level and/or need to be split up in different types of pauses. >> There is no good reason cn=config's need to have slapd config variables >> for itself to affect network operations - nor, I hope, affect pool-level >> operations like pool_purgekey(). > > I suppose implementing things this way can appear to be too blunt. But the > amount of locks required to do it at a finer level is, IMO, unmanageable. I know. That's why I'm not suggesting to do away with pauses, only to make them a little less crude than "1 type of pause for everything". See below. > configurations are changed from under them.) > >> E.g. cn=config could use a slapd_config_pause() call to ask for lone >> access to the config variables instead of thread_pool_pause() call. >> Slapd operations that must respect such pauses should thus call a >> slapd_config_pausecheck() macro/function, not thread_pool_pausecheck() >> or depend on the pool to respect slapd-level pauses. >> connection_read_thread() can then go ahead and use the connection >> independently of those pauses. If it reaches connection_operation(), >> it'll need to check for slapd pause. > > Yes, I've thought about this approach too. That makes the pause mechanisms > even messier. It does? I was hoping that would be the cleaner solution. Lifting them from pool level to slapd level would mean that every task now submitted to the pool will be responsible for (a) calling pausecheck itself or (b) avoiding all variables/features that today need pauses. Then we could add the option (c) a *small* set of (groups of?) config variables which each will need their own mutex or pause or something. Such as loglevel, since the code is doing Debug() all over the place. And presumably some network variables, for this ITS. But I quite agree that this is hopeless if that "small" set cannot be kept small. For one thing it might introduce yet another "lock order", threatening to be inconsistent with the lock order of other mutexes. > Another possibility is to just try to read 1 byte in the listener > thread, to detect the hangup there when we have no other means to > discover it. We would have to make sure to be able to unget this byte > back to the bottom of the sockbuf stack if there's valid data. This > will affect the throughput of the listener thread, but it may not be > too terrible a hit. Does that help for a socket which has gotten blocked on select() due to full socket buffers? Also if doing - a read() OS call anyway, why not read() a large enough chunk that it would commonly be a full PDU? As long as it's read to a sockbuf buffer rather than into slapd's data. -- Hallvard

1 0

Re: (ITS#6257) libldap: getopt flag to return the SASL username
by hyc＠symas.com 26 Aug '09

26 Aug '09

masarati(a)aero.polimi.it wrote: > My concern was not from an operational point of view: the simple concept > of having a library dynamically loading something that could no longer be > present is calling for trouble, unless handled appropriately, and probably > there is no way to do it safely as one could always remove a .so while > it's in use (although I guess on any decent system the object will be > cached or loaded somewhere until it's in use). > > My concern is about the char* array returned by that call: if for any > reason the library decides to refresh it, but the caller of > ldap_get_option() is still holding a pointer to that array, this calls for > pointing to freed memory and things like that, as far as I understand. > For this reason, returning a copy sounds wiser. Whether the contents of > that copy is valid or not, namely the related mechanism is available or > not, that's an entirely different issue. The Cyrus mechlist is only initialized once during the life of the library. (Subsequent init calls just increment an init counter and then return.) It will not change in normal use. If an application loads / unloads / reloads the library, the list may change, but any app that goes thru this trouble will already know they have to call the init functions all over again, and re-fetch this list. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#6257) libldap: getopt flag to return the SASL username
by masarati＠aero.polimi.it 26 Aug '09

26 Aug '09

>> On a somewhat related issue, I note that LDAP_OPT_X_SASL_MECHLIST >> returns >> a pointer to an array of chars that apparently cannot be mucked with. >> >> Assuming my understanding is correct, I wonder if this behavior is >> desirable or not, given the fact that if another mech is added, e.g. by >> adding a dynamic module, I expect this list to change. > > These are SASL mechs with the plugin modules. Right? > >>From an operational standpoint: If a SASL plugin module for a mech was >> added I > think it's acceptable that a software which queries this option is > restarted > before this SASL mech is known to the software. Probably one has to add > additional configuration for this SASL mech. > > Now the question is what happens if a SASL plugin module is removed and > the > software trys to use the removed SASL mech. Clearly removing plugin > modules in > a running system is asking for trouble anyway... > > Having said this I would not care too much about this list going to > change... My concern was not from an operational point of view: the simple concept of having a library dynamically loading something that could no longer be present is calling for trouble, unless handled appropriately, and probably there is no way to do it safely as one could always remove a .so while it's in use (although I guess on any decent system the object will be cached or loaded somewhere until it's in use). My concern is about the char* array returned by that call: if for any reason the library decides to refresh it, but the caller of ldap_get_option() is still holding a pointer to that array, this calls for pointing to freed memory and things like that, as far as I understand. For this reason, returning a copy sounds wiser. Whether the contents of that copy is valid or not, namely the related mechanism is available or not, that's an entirely different issue. p.

1 0

Re: (ITS#6276) paused pool can deadlock if writers are waiting
by hyc＠symas.com 26 Aug '09

26 Aug '09

Hallvard B Furuseth wrote: > This is a quick abstract answer, I haven't dived into the code to see > what my suggestions would mean in practice. Anyway: > It sounds to me like this problem cannot be fully prevented when network > operations share the same pool queue as everything else, and we have > operations that can wait for other operations. All pool threads can be > occupied with LDAP-level operations, so that network-level operations > which the LDAP-level operations depend on can get blocked. If slapd has > a design which even tries to guarantee forward progress, I'm not aware > of it. > > So LDAP-level operations ought to leave at least one thread free for for > network-level operations. Not necessary a designated thread: A thread > moving from network-level to LDAP-level operation like > connection_read_thread() does could first check that at least one other > thread remains available for network-level operation. Yes... > Beyond that, my immediate reaction was that pauses are implemented at > the wrong level and/or need to be split up in different types of pauses. > There is no good reason cn=config's need to have slapd config variables > for itself to affect network operations - nor, I hope, affect pool-level > operations like pool_purgekey(). I suppose implementing things this way can appear to be too blunt. But the amount of locks required to do it at a finer level is, IMO, unmanageable. The fact that you can change global slapd config parameters (such as the number of threads, size of sockbuf buffers, etc.) makes it inherently safe for *anything* else to be active while config changes are being made. And sifting thru each variable to decide how sensitive they are, and when they are unsafe, will inevitably lead to requiring locks on every single piece of configuration data. (Want to look up an attribute type, or objectclass? Or a database suffix? etc. etc. etc... There are countless things we do arbitrarily that simply won't work if we allow arbitrary threads to continue to run while their configurations are changed from under them.) > E.g. cn=config could use a slapd_config_pause() call to ask for lone > access to the config variables instead of thread_pool_pause() call. > Slapd operations that must respect such pauses should thus call a > slapd_config_pausecheck() macro/function, not thread_pool_pausecheck() > or depend on the pool to respect slapd-level pauses. > connection_read_thread() can then go ahead and use the connection > independently of those pauses. If it reaches connection_operation(), > it'll need to check for slapd pause. Yes, I've thought about this approach too. That makes the pause mechanisms even messier. Another possibility is to just try to read 1 byte in the listener thread, to detect the hangup there when we have no other means to discover it. We would have to make sure to be able to unget this byte back to the bottom of the sockbuf stack if there's valid data. This will affect the throughput of the listener thread, but it may not be too terrible a hit. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#6276) paused pool can deadlock if writers are waiting
by h.b.furuseth＠usit.uio.no 26 Aug '09

26 Aug '09

This is a quick abstract answer, I haven't dived into the code to see what my suggestions would mean in practice. Anyway: It sounds to me like this problem cannot be fully prevented when network operations share the same pool queue as everything else, and we have operations that can wait for other operations. All pool threads can be occupied with LDAP-level operations, so that network-level operations which the LDAP-level operations depend on can get blocked. If slapd has a design which even tries to guarantee forward progress, I'm not aware of it. So LDAP-level operations ought to leave at least one thread free for for network-level operations. Not necessary a designated thread: A thread moving from network-level to LDAP-level operation like connection_read_thread() does could first check that at least one other thread remains available for network-level operation. Beyond that, my immediate reaction was that pauses are implemented at the wrong level and/or need to be split up in different types of pauses. There is no good reason cn=config's need to have slapd config variables for itself to affect network operations - nor, I hope, affect pool-level operations like pool_purgekey(). E.g. cn=config could use a slapd_config_pause() call to ask for lone access to the config variables instead of thread_pool_pause() call. Slapd operations that must respect such pauses should thus call a slapd_config_pausecheck() macro/function, not thread_pool_pausecheck() or depend on the pool to respect slapd-level pauses. connection_read_thread() can then go ahead and use the connection independently of those pauses. If it reaches connection_operation(), it'll need to check for slapd pause. Alternatively, with different types of pauses/operations: The pool could have different sets of pauses (a pause-type bitmask?) and if one type of pauses happens, operations that ignore that type of pause will not be affected. This messes up the pool code further though, and I already hate what pauses do to the otherwise clean pool code:-( -- Hallvard

1 0

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

openldap-bugs August 2009