openldap-bugs May 2007

openldap-bugs@openldap.org

42 participants
156 discussions

(ITS#4956) slapd cores with a SEGFAULT after a failed proxy authorization
by pturgyan＠umich.edu 11 May '07

11 May '07

Full_Name: Paul Turgyan Version: 2.3.35 OS: linux - 2.6 kernal URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (141.213.231.201) slapd cores with a SEGFAULT after a failed proxy authorization, with a core file like: (gdb) bt #0 0xb7c74be5 in *__GI___libc_free (mem=0xab5fc714) at malloc.c:3402 #1 0x0807aa96 in ch_free (ptr=0xab5fc714) at ch_malloc.c:139 #2 0x080a5886 in slap_sasl_authorize (sconn=0xacb03000, context=0xada20428, requested_user=0xacb03910 "pturgyan(a)UMICH.EDU", rlen=18, auth_identity=0xacb03a11 "pturgyan(a)UMICH.EDU", alen=18, def_realm=0xacb049e0 "UMICH.EDU", urlen=9, props=0x0) at sasl.c:673 #3 0xb7ea2ba3 in do_authorization (s_conn=0xacb03000) at server.c:1163 #4 0xb7ea2d18 in sasl_server_step (conn=0xacb03000, clientin=0xacb01dae "`?\006\t*\206H\206?\022\001\002\002\002\001\004", clientinlen=0, serverout=0xad61d114, serveroutlen=0x1) at server.c:1420 #5 0x080a6654 in slap_sasl_bind (op=0x830fad0, rs=0xad61d240) at sasl.c:1395 #6 0x0807cdaa in fe_op_bind (op=0x830fad0, rs=0xad61d240) at bind.c:276 #7 0x0807c5b3 in do_bind (op=0x830fad0, rs=0xad61d240) at bind.c:200 #8 0x0806176f in connection_operation (ctx=0x0, arg_v=0x830fad0) at connection.c:1133 #9 0x08133e28 in ldap_int_thread_pool_wrapper (xpool=0x81ff4e0) at tpool.c:478 #10 0xb7e83c6b in start_thread (arg=0xad61dbb0) at pthread_create.c:261 #11 0xb7cc9d9e in clone () from /lib/libc.so.6 We are using cyrus-sasl-2.1.21 heimdahl-0.6.2 During a proxy auth, c_sasl_dn is set at sasl.c:682. If the proxy auth fails for some reason, then c_sasl_dn is never cleared. Sometime later, when that connection block is reused for another sasl bind, slap_sasl_authorize attempts to free the memory pointed to by c_sasl_dn.bv_val at sasl.c:673 and free SEGFAULTs. This can be duplicated by looping a ldapsearch that does a sasl bind with an invalid proxy authorization. You have to loop long enough to force a reuse of a connection struct. The following patch NULL's out c_sasl_dn at sasl.c:702 when the proxy auth is disallowed. *** sasl.c- Thu Jan 25 07:42:38 2007 --- sasl.c Fri May 11 13:47:19 2007 *************** *** 699,704 **** --- 699,705 ---- "proxy authorization disallowed (%d)\n", (long) (conn ? conn->c_connid : -1), rc, 0 ); + BER_BVZERO( &conn->c_sasl_dn ); sasl_seterror( sconn, 0, "not authorized" ); ch_free( authzDN.bv_val ); return SASL_NOAUTHZ;

1 0

(ITS#4955) v3 referrals problem in libldap/request.c
by steven.rellinger＠hp.com 11 May '07

11 May '07

Full_Name: Steve Version: 2.3.35 OS: Windows 2003 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (15.227.217.75) There is a potential hang in the libldap/request.c code for chasing v3 referrals. I've narrowed down the scope of this issue to libldap/request.c in the function ldap_chase_v3referrals, specifically the loop after find_connection where it tries to see if the DN was already requested on that connection. (This is all around line 889) >From an outsider's perspective it seems like a really trivial bug was introduced into this loop in how it deals with the 'lp' variable. if ( lp == origreq ) { lp = lp->lr_child; } else { lp = lr->lr_refnext; } ... The variable lr does not change in this loop at all, it seems like a mistake to use it here as a condition on the loops flow. To me, it would seem like they meant to use something more like this which is a traditional linked list tree enumeration. if ( lp == origreq ) { lp = lp->lr_child; } else { lp = lp->lr_refnext; } I've tested this particular situation and it seems to eliviate the issue I was experiencing where it was chasing referrals from AD and never finishing this part of the function. I hope this helps others and or can be applied to the latest source code.

1 0

Re: (ITS#4943) tpool.c pause vs. finish
by h.b.furuseth＠usit.uio.no 11 May '07

11 May '07

Howard Chu writes: > Quite a lot of changes, will take some time to evaluate the overall > impact. I hope it helped to make one patch per issue, and that the comments were clear about what each patch was fixing. > The semaphore stuff probably needs to be axed/unifdef'd. It is If it is unusable, do you mean "axed/#if 0'ed"? > unusable, it would only result in deadlocks, and I think I already > removed the slapd code that would have invoked it. slapd/daemon.c does ldap_lazy_sem_init(). No other calls outside tpool.c. >> tpool.c breaks if there are multiple pools. The simplest >> solution seems to be to remove support for multiple pools. >> The problem is thread_keys[]. It is shared between pools, but: > > slapd only uses a single pool, and that's really the only use case we > care about. In fact it makes no sense to support multiple pools, > because we have no mechanism to assert scheduling priorities of one > pool over another. I'm afraid I did not think of scheduling priorities at all, only bugs. But it certainly makes things simpler if we remove multiple pools. Simple documentation of how tpool should be used may remove some of my concerns - or how it _is_ used, since it's only used in slapd. (Several probably only are problems if it is used as a library feature. tpool depends on being used the way slapd is using it. I've wondered what it's doing in libldap_r/ instead of in slapd/.) I wrote: >> - ltp_max_count (max #threads) is pool-specific, but is used to >> protect from array bounds violation on thread_keys[LDAP_MAXTHR]. >> (New simple code from me, previously there was no limit.) sorry, s/array bounds violation/eternal loop looking for free slot/. >> Other thread_keys[] strangeness: >> >> - setkey does not free the old data with ltk_free. Possibly that's >> intentional, but if so that requires weird code elsewhere. > > Yes, that's intentional. ltk_free is only intended for cleanup if a key > is still set by the time a thread exits. All of the keys we set are live > for the entire life of a thread, they don't get re-assigned. So if I call setkey and the key already exists and has an ltk_free function, it's my responsibility to know the old data was there, and free it or take back ownership? (A clarifying comment would be useful.) >> - maybe related to how getkey can return the ltk_free function - which >> seems crazy since a caller who uses it must then explicitly reset the >> key afterwards (or better, before), to be sure purgekey doesn't >> see the data and free it again. > > Nothing actually depends on returning the ltk_free function, that was > just included for completeness' sake. Then I suggest the getkey(,,,kfree) output parameter is removed, to avoid inviting to future duplicate frees. If anyone later needs it, a cleaner mechanism can be included - e.g. a setkey(data=kfree=0) variant which does call ltk_free before removing the key, or maybe a setkey which returns any old data and free function. >> - tpool.c uses LDAP_FREE(), can call ldap_pvt_thread_pool_context(), >> which uses thread_keys[]. If one thread does that when another thread >> has paused the pool, that can break since thread_keys[] is reserved >> for ldap_pvt_thread_pool_purgekey() during pauses. I'm wrong, thread_keys[] is _read-only_ during pauses. pool_context() is safe since it does not write thread_keys, just like pool_purgekey(). > Since there is only one thread pool in use, it is impossible for another > thread to be active. No. Though it's true that my concern above was wrong. It is "ltp_active_count--" which triggers the start of a pause. So "active" here means "included in ltp_active_count". A pool_wrapper() thread is only "active" when handling a context, not when adding itself to or removing itself from thread_keys[]. That's what my new pause waits were for. Also I don't think the main thread is included in the "active" threads - so it's only impossible if the main thread doesn't interfere. (Again, I suppose a comment may be the "fix".) But this reminds me: How much should an ltk_free() function be allowed to do? - pool_context_reset() calls ltk_free(). Should pool_wrapper() go active (wait for !pause, increment ltp_active_count, unlock ltp_mutex) before calling pool_context_reset()? - Should ltk_free() be allowed to change the contents of ltu_key[], so pool_purgekey() and pool_context_reset() should be prepared for that? >> - ltp_pending_count includes threads sleeping in pool_pause() - should >> it? In effect, pool_pause() temporarily reduces ltp_max_pending. > > I don't see why any special consideration is needed here. Nothing else > in the thread pool is going to move while a pause is in effect. That single thread could do ldap_pvt_thread_pool_submit() - which could fail due to the reduced effectinve max_pending. Again, if slapd won't, even with weird overlays and whatnot in effect, a comment will fix that. >> - In ldap_pvt_thread_pool_submit(): >> >> This comment: >> /* there is another open thread, so this >> * context will be handled eventually. >> * continue on and signal that the context >> * is waiting. >> */ >> is wrong if the previous if (pool->ltp_open_count == 0) was >> taken but the no thread was found inside that. > > The comment may be incomplete... "and signal that..." should be "we have signalled that...". Otherwise it's correct if we remove the remove the semaphore and unlock like you say below. With the current code, I think the comment should be /* another open thread took or eventually will * take care of this context. continue on, we * have signalled that the context is waiting. */ >> Also if ctx is invalid (another thread handled it) and a new >> pending request has gotten the now-unused ctx, then the >> 'if (pool->ltp_open_count == 0)' code will free that new >> pending request instead of the old one. > > Most likely this is due to unlocking and relocking the mutex around the > semaphore code. Axe that, remove the gratuitous unlock/relock, and then > that problem goes away. Yes, that should fix it. >> - Note, I haven't looked at the new semaphore stuff yet. >> >> Will get back to this later, but could use review so far. In particular >> I'd like to know if we'll keep or kill multi-pool support. > > Thus far we've only had to worry about a single pool. If anyone can come > up with a reason to need multiple pools, I guess we need to hear it. > Given the lack of scheduling control, I don't see how it would be of any > benefit. At the same time, I don't see a compelling reason to perturb > the code at the moment. Well, there are still bugs left. And can't say I like to have a feature (multiple pools) which only a not-bug because it isn't used. -- Regards, Hallvard

1 0

(ITS#4954) must clear c_sasl_dn after error
by donn＠u.washington.edu 10 May '07

10 May '07

Full_Name: Donn Cave Version: 2.4.4 OS: Red Hat RHEL 3 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (128.95.135.150) After SASL bind failure, c_sasl_dn is not cleared, and eventually causes a crash when it is encountered in a subsequent bind attempt, in ch_free, slap_sasl_authorize ca. line 676. (Depending on platform malloc - NetBSD complains here but doesn't crash, Linux/glibc may or may not complain but does corrupt heap and eventually crashes.) Duplicate: Attempt SASL PLAIN bind as "" with password "" (for example) to get SASL "User not found" error. Then make one or more SASL EXTERNAL binds, until server crashes - shouldn't take but one or two. I make a supportSASLMechanisms search before the PLAIN bind, because that's what our user did when he crashed our service, but this is probably irrelevant. I do not specify a bind name in the EXTERNAL bind. Fix: in slap_sasl_bind, ca. line 1713, BER_BVZERO(&op->o_conn->c_sasl_dn) after bind fails (sc neither SASL_OK nor SASL_CONTINUE) Problem initially encountered in 2.3.24, also observed in 2.3.21 and 2.4.4. Fix tested on 2.3.21 and 2.3.24.

1 0

(ITS#4953) test043-delta-syncrepl failed (exit 1): producer and consumer databases differ
by michael＠stroeder.com 10 May '07

10 May '07

Full_Name: Michael Ströder Version: HEAD OS: openSUSE 10.2 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (84.163.125.59) michael@nb2:/usr/src/openldap-HEAD/ldap/tests> ./run -b bdb test043 Cleaning up test run directory leftover from previous run. Running ./scripts/test043-delta-syncrepl... running defines.sh Starting producer slapd on TCP/IP port 9011... Using ldapsearch to check that producer slapd is running... Using ldapadd to create the context prefix entries in the producer... Starting consumer slapd on TCP/IP port 9012... Using ldapsearch to check that consumer slapd is running... Using ldapadd to populate the producer directory... Waiting 15 seconds for syncrepl to receive changes... Stopping the provider, sleeping 10 seconds and restarting it... Using ldapsearch to check that producer slapd is running... Using ldapmodify to modify producer directory... Waiting 15 seconds for syncrepl to receive changes... Stopping consumer to test recovery... Modifying more entries on the producer... Restarting consumer... Waiting 25 seconds for syncrepl to receive changes... Try updating the consumer slapd... Waiting 15 seconds for syncrepl to receive changes... Using ldapsearch to read all the entries from the producer... Using ldapsearch to read all the entries from the consumer... Filtering producer results... Filtering consumer results... Comparing retrieved entries from producer and consumer... test failed - producer and consumer databases differ

1 0

Re: (ITS#4943) tpool.c pause vs. finish
by hyc＠symas.com 09 May '07

09 May '07

h.b.furuseth(a)usit.uio.no wrote: > OK, I've fixed some of it in HEAD. See comments in the commits for > tpool.c 1.63 and outwards. Quite a lot of changes, will take some time to evaluate the overall impact. > > RE23 has the same problems. Nearly identical code sans the new > semaphore stuff and a few variable names, so fixes should be easy > to import when they've been tested a bit. The semaphore stuff probably needs to be axed/unifdef'd. It is unusable, it would only result in deadlocks, and I think I already removed the slapd code that would have invoked it. > Remaining problems: > > tpool.c breaks if there are multiple pools. The simplest > solution seems to be to remove support for multiple pools. > The problem is thread_keys[]. It is shared between pools, but: slapd only uses a single pool, and that's really the only use case we care about. In fact it makes no sense to support multiple pools, because we have no mechanism to assert scheduling priorities of one pool over another. > Other thread_keys[] strangeness: > > - setkey does not free the old data with ltk_free. Possibly that's > intentional, but if so that requires weird code elsewhere. Yes, that's intentional. ltk_free is only intended for cleanup if a key is still set by the time a thread exits. All of the keys we set are live for the entire life of a thread, they don't get re-assigned. > - maybe related to how getkey can return the ltk_free function - which > seems crazy since a caller who uses it must then explicitly reset the > key afterwards (or better, before), to be sure purgekey doesn't > see the data and free it again. Nothing actually depends on returning the ltk_free function, that was just included for completeness' sake. > Other notes: > > - I've inserted new, possibly too aggressive, waits for ltp_pause in > pool_wrapper. See the tpool.c 1.65 comment. Yet another alternative > than splitting the ltp_pause state might be to remove the requirement > that purgekey() can only be used by a paused thread. (Could loop > until the key is _really_ gone instead, or something.) Will look at this later. > > - tpool.c uses LDAP_FREE(), can call ldap_pvt_thread_pool_context(), > which uses thread_keys[]. If one thread does that when another thread > has paused the pool, that can break since thread_keys[] is reserved > for ldap_pvt_thread_pool_purgekey() during pauses. Since there is only one thread pool in use, it is impossible for another thread to be active. > - ltp_pending_count includes threads sleeping in pool_pause() - should > it? In effect, pool_pause() temporarily reduces ltp_max_pending. I don't see why any special consideration is needed here. Nothing else in the thread pool is going to move while a pause is in effect. > - In ldap_pvt_thread_pool_submit(): > > This comment: > /* there is another open thread, so this > * context will be handled eventually. > * continue on and signal that the context > * is waiting. > */ > is wrong if the previous if (pool->ltp_open_count == 0) was > taken but the no thread was found inside that. The comment may be incomplete... > Also if ctx is invalid (another thread handled it) and a new > pending request has gotten the now-unused ctx, then the > 'if (pool->ltp_open_count == 0)' code will free that new > pending request instead of the old one. Most likely this is due to unlocking and relocking the mutex around the semaphore code. Axe that, remove the gratuitous unlock/relock, and then that problem goes away. > - Note, I haven't looked at the new semaphore stuff yet. > > Will get back to this later, but could use review so far. In particular > I'd like to know if we'll keep or kill multi-pool support. Thus far we've only had to worry about a single pool. If anyone can come up with a reason to need multiple pools, I guess we need to hear it. Given the lack of scheduling control, I don't see how it would be of any benefit. At the same time, I don't see a compelling reason to perturb the code at the moment. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#4943) tpool.c pause vs. finish
by h.b.furuseth＠usit.uio.no 09 May '07

09 May '07

OK, I've fixed some of it in HEAD. See comments in the commits for tpool.c 1.63 and outwards. RE23 has the same problems. Nearly identical code sans the new semaphore stuff and a few variable names, so fixes should be easy to import when they've been tested a bit. Remaining problems: tpool.c breaks if there are multiple pools. The simplest solution seems to be to remove support for multiple pools. The problem is thread_keys[]. It is shared between pools, but: - ldap_pvt_thread_pool_purgekey() needs _all_ pools to be paused, but ldap_pvt_thread_pool_pause() only pauses one pool. If we keep multi-pool support, maybe it only should destroy keys from the current pool? - ltp_max_count (max #threads) is pool-specific, but is used to protect from array bounds violation on thread_keys[LDAP_MAXTHR]. (New simple code from me, previously there was no limit.) - This '#if 0'ed comment is also wrong with multiple pools: /* start up one thread, just so there is one. no need to * lock the mutex right now, since no threads are running. */ Time to remove the whole #if 0? Other thread_keys[] strangeness: - setkey does not free the old data with ltk_free. Possibly that's intentional, but if so that requires weird code elsewhere. - maybe related to how getkey can return the ltk_free function - which seems crazy since a caller who uses it must then explicitly reset the key afterwards (or better, before), to be sure purgekey doesn't see the data and free it again. Other notes: - I've inserted new, possibly too aggressive, waits for ltp_pause in pool_wrapper. See the tpool.c 1.65 comment. Yet another alternative than splitting the ltp_pause state might be to remove the requirement that purgekey() can only be used by a paused thread. (Could loop until the key is _really_ gone instead, or something.) - tpool.c uses LDAP_FREE(), can call ldap_pvt_thread_pool_context(), which uses thread_keys[]. If one thread does that when another thread has paused the pool, that can break since thread_keys[] is reserved for ldap_pvt_thread_pool_purgekey() during pauses. Not sure if that can happen, haven't checked. If LDAP_MALLOC/LDAP_CALLOC also can do call pool_context(), we do have problem in pool_submit(). One fix is to use malloc/calloc/free instead. Another, in the case of LDAP_FREE(ctx) in pool_submit, is to put the ctx on the free list instead. - ltp_pending_count includes threads sleeping in pool_pause() - should it? In effect, pool_pause() temporarily reduces ltp_max_pending. - In ldap_pvt_thread_pool_submit(): This comment: /* there is another open thread, so this * context will be handled eventually. * continue on and signal that the context * is waiting. */ is wrong if the previous if (pool->ltp_open_count == 0) was taken but the no thread was found inside that. Also if ctx is invalid (another thread handled it) and a new pending request has gotten the now-unused ctx, then the 'if (pool->ltp_open_count == 0)' code will free that new pending request instead of the old one. One fix is to add an is_pending flag=1 in submit() and int *ltc_pendingp; in ldap_int_thread_ctx_t. In submit, if need_thread, set ltc_pendingp = &is_pending. In if(pool->ltp_open_count == 0) check is_pending instead of searching for ctx. Elsewhere set *ltc_pendingp = 0 and/or ltc_pendingp = NULL when appropriate. - Note, I haven't looked at the new semaphore stuff yet. Will get back to this later, but could use review so far. In particular I'd like to know if we'll keep or kill multi-pool support. -- Regards, Hallvard

1 0

(ITS#4952) null CSN due to CSN length mismatch in updateCookie
by donn＠u.washington.edu 09 May '07

09 May '07

Full_Name: Donn Cave Version: 2.4.4 OS: Red Hat RHEL 3 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (128.95.135.150) syncrepl_updateCookie finds CSN values in syncCookie and si->si_cookieState. Numerically, the syncCookie CSN is lesser, but it compares greater because it's longer: syncCookie->ctxcsn: 20070508130333Z#000000#00#000000 si->si_cookieState->cs_vals[0]: 20070507215402.502403Z#000000#000#000000 Eventually slap_queue_csn() gets a null CSN, and slapd aborts. I am pretty sure the former explains the latter. The only thing that doesn't fit is that gdb reports "j" == 20936000, where I expect 0, but I think everything else adds up - numcsns is 1, cs_num is 1, the two sids are the same (0). Evidently this format difference is entryCSN vs. contextCSN. The unusual order may be due to entries modified in the master with explicit entryCSN values. I guess that's not supported, but it seems to have exposed a bug here - I assume you want a comparison between dates, not format lengths. To reproduce (I'm guessing): modify master directory with explicit entryCSN values that are later than its contextCSN.

1 0

Re: (ITS#4943) tpool.c pause vs. finish
by h.b.furuseth＠usit.uio.no 09 May '07

09 May '07

Actually, tpool.c is quite buggy. I'm fixing some of them, and will summarize the rest later. -- Regards, Hallvard

1 0

Re: (ITS#4948) #if/#elif argument in .../include/ac/time.h
by h.b.furuseth＠usit.uio.no 09 May '07

09 May '07

I wrote: > That said, it's indeed a reasonable cleanup. I'll do that and some > others shortly. Fixed in the cvs HEAD version. There are a few cases left though - in particular, do anyone know if #if DOS/MACOS/HAVE_WINSOCK can be replaced with #ifdef? I'm not touching them without word from someone who knows OpenLDAP on these platforms. Besides, I still don't know if the "issues" you refer to are just compiler warnings, or errors - or if the one macro you reported was a special case. So while allowing -Wundef to catch macro typos is worthwhile, I likely won't do anything more unless I hear a reason to. And I'd like to know what Plan 9 C says about this program: > BTW, will 9 C accept this (when you do not define FOO/BAR)? What > error/warning messages, if any? > > #if defined(FOO) && FOO /* Guard #if FOO with defined(FOO) */ > int main() { return 2; } > #elif BAR > int main() { return 1; } > #else > int main() { return 0; } > #endif -- Regards, Hallvard

1 0

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

openldap-bugs May 2007