openldap-bugs June 2009

openldap-bugs@openldap.org

41 participants
115 discussions

Re: (ITS#6158) syncprov: assert causing slapd to core dump
by jonathan＠phillipoux.net 03 Jun '09

03 Jun '09

On 02.06.2009 18:23, Howard Chu wrote: > jonathan(a)phillipoux.net wrote: >> OK, I've spent some more time trying to understand this part of >> syncprov.c. From what I understand : >> >> - the assert failure in ldap_pvt_runqueue_resched is caused by the fact >> syncprov_qstart is trying to "reschedule" a task that is no longer in >> the task_list >> - the only time the task is removed from the task_list (via >> ldap_pvt_runqueue_remove) is when the task is being run, in >> syncprov_qtask, if syncprov_qplay returns !=0 >> - the next time syncprov_qstart is called, it finds "so->s_qtask" is not >> NULL, and tries to reschedule the task, but it's no longer in the >> task_list. >> >> I've written a patch that sets "so->s_qtask" to NULL in syncprov_qtask, >> just after removing the task from the task_list. So that when >> syncprov_qstart is called again, it goes into >> ldap_pvt_runqueue_insert... The patch is attached. >> >> Unfortunately, I can't confirm it fixes the bug since I can't reproduce >> it... For those who understand the logic behind this, does this make any >> sense? :) > > Ah, you want rev 1.249 of syncprov.c. Closing this as a dup of ITS#5776. Indeed, that's great. Thanks a lot! > Of course, all of this code has been removed from RE24 as of 1.265. Will this patch make it into RE23 for a possible maintenance release of 2.3? Regards, Jonathan -- -------------------------------------------------------------- Jonathan Clarke - jonathan(a)phillipoux.net -------------------------------------------------------------- Ldap Synchronization Connector (LSC) - http://lsc-project.org --------------------------------------------------------------

1 0

(ITS#6159) olcLimits has no replace/delete
by mbackes＠symas.com 02 Jun '09

02 Jun '09

Full_Name: Matthew Backes Version: 2.4, head OS: any URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (76.88.99.93) olcLimits only accepts modify ADD suboperations; replace and delete fail, making any changes to existing olcLimits specifications impossible without stopping and editing LDIF.

1 0

Re: (ITS#6138) Bad Cancel/Abandon/"internal abandon"/Syncprov interactions
by h.b.furuseth＠usit.uio.no 02 Jun '09

02 Jun '09

back-ldap:extended.c also does "suppress response, it has been sent", but does it by returning and setting rs->sr_err = SLAPD_ABANDON. Might break assumptions somewhere that SLAPD_ABANDON implies o_abandon was set. And I guess the hack fails if the operation gets cancelled. ======================================================================== I think these are the Operation states related to Cancel and Abandon: op->o_abandon is set for these - could extend to multiple values: A) Operation Abandoned/Cancelled by client. B) Operation implicitly abandoned by client. (Bind or lost connection) C) Operation abandoned by server. (It wants to close the connection) D) Suppress response - a duplicate of the operation will proceed. (syncprov) E) Suppress response - final send_ldap_response() was done. (retcode overlay) rs->sr_err == SLAPD_ABANDON if: F) The backend obeyed o_abandon. (Cancel op, if any, will succeed) G=E) Suppress response - final send_ldap_response() was done. (back-ldap) op->o_cancel packs these states/values: H) The o_abandon is due to a Cancel. I) Cancel operation wants a result, cancelled op must set it and wait. J) Result is available to the Cancel operation. K) Result. (LDAP result code, or SLAP_CANCEL_ACK for success) L) Cancel operation has fetched result, cancelled operation can proceed. States that fit in none of the above, or poorly so: M) Operation must not be waited for, e.g. by Cancel. Operation is itself waiting for others, e.g. cn=config update. N) Operation invisible to Abandon/Cancel/internal abandon. msgID reusable due to result sent to client. Also case D (syncprov)? Fix by removing the op from op->o_conn->c_ops? Or does that just move the problem around? Would need to do something to o_conn to prevent connection_close() from doing connection_destroy(). O) Operation result has been committed, do not abandon. ITS#6059. But o_abandon can be set while trying to commit, unless this flag is set before trying - in which case we can't abandon an operation which is failing to commit, which may be when it's most relevant. Could reset o_abandon, if anyone can keep straight the consequences. Or replace the 'if ( op->o_abandon )' tests with some macro call. Still, interactions with other states could be a problem. About the o_abandon values above: B can be treated like A, I think. C differs in that Cancel/Abandon(operation) should not say "already abandoned" since the client doen't know about the abandon. Could be solved with a vague error message. D-E differ in that o_abandon gets set even though the backends' cancel/abandon handlers were not called. Unsure of the effects of that. D Syncprov duplicating a Persistent Search operation. Handled similar to a server-initiated abandon? Except if the operation cannot be "invisible to Abandon/Cancel" above it must remain possible to Abandon/Cancel it. E Suppress response - response has been sent: Set when exiting slap_send_ldap_result() & co? Handled similar to a server-initiated abandon? At the time slap_send_ldap_result() is called again, the operation may have set up things which need to be cleaned up in the normal way. Yet it has already gone through that function once, doing callbacks etc. Must "final response" code be prepared to be called twice? Beyond that, the main problem would be code which transitions to one state to another, it needs to handle the other cases. -- Hallvard

1 0

Re: (ITS#6152) proxycache enhancements
by mhardin＠symas.com 02 Jun '09

02 Jun '09

It might be less intrusive codewise and more flexible if we left the behavior of cache expiration the same and added a parameter to each template called "Time to Refresh" (TTR). Then you set long or unlimited cache expirations, which are always in effect, but set a shorter TTR that would trigger an asynchronous refresh when the TTR expired. If the db is not available these refreshes will simply fail, but the data will remain in the cache at least until it's expired by the usual means. This gives the solution designer the option of deciding how long a system can run disconnected while still being able to separately determine how stale the contents of the cache will get when connected. It also means that pcache itself doesn't need to switch modes based on whether it thinks it's connected to a db or not, and it fact may not need to even know if it is connected or not. There is still room in this design for a flag that controls whether pcache should behave as if it's disconnected or connected, but I'm not sure how useful that is given the changes described above. Cheers, -Matt Matthew Hardin Symas Corporation - The LDAP Guys http://www.symas.com

1 0

Re: (ITS#6158) syncprov: assert causing slapd to core dump
by hyc＠symas.com 02 Jun '09

02 Jun '09

jonathan(a)phillipoux.net wrote: > OK, I've spent some more time trying to understand this part of > syncprov.c. From what I understand : > > - the assert failure in ldap_pvt_runqueue_resched is caused by the fact > syncprov_qstart is trying to "reschedule" a task that is no longer in > the task_list > - the only time the task is removed from the task_list (via > ldap_pvt_runqueue_remove) is when the task is being run, in > syncprov_qtask, if syncprov_qplay returns !=0 > - the next time syncprov_qstart is called, it finds "so->s_qtask" is not > NULL, and tries to reschedule the task, but it's no longer in the task_list. > > I've written a patch that sets "so->s_qtask" to NULL in syncprov_qtask, > just after removing the task from the task_list. So that when > syncprov_qstart is called again, it goes into > ldap_pvt_runqueue_insert... The patch is attached. > > Unfortunately, I can't confirm it fixes the bug since I can't reproduce > it... For those who understand the logic behind this, does this make any > sense? :) Ah, you want rev 1.249 of syncprov.c. Closing this as a dup of ITS#5776. Of course, all of this code has been removed from RE24 as of 1.265. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#6158) syncprov: assert causing slapd to core dump
by jonathan＠phillipoux.net 02 Jun '09

02 Jun '09

This is a multi-part message in MIME format. --------------080204020206070201080405 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 02.06.2009 12:28, jonathan(a)phillipoux.net wrote: > Full_Name: Jonathan Clarke > Version: 2.3.43 > OS: Solaris > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (213.41.243.192) > > > Hi, > > I have a 2.3.43 running on a Solaris Sparc server, which crashes occasionally - > once every week or two, always during the night. At this particular time a large > number of operations are performed, including mass deletes and adds. I haven't > been able to reproduce this bug, just watch it happen on the production server > every now and again... > > I managed to obtain a coredump, and a backtrace (at the end of this message). I > realize this isn't much to go on, but I'm rather unfamiliar with this part of > the code, so I wondered if anyone has an idea what's going on here? > > FWIW, the dynlist and chain overlays are in use on the server, and the database > is bdb, with a syncrepl consumer as well as syncprov overlay. > > > Backtrace follows: > 8<------------------------------------------------------------- > Thread 1 (process 1054014 ): > #0 0xfee4aa58 in _lwp_kill () from /lib/libc.so.1 > #1 0xfede5a64 in raise () from /lib/libc.so.1 > #2 0xfedc1954 in abort () from /lib/libc.so.1 > #3 0xfedc1b90 in _assert () from /lib/libc.so.1 > #4 0xff30ef44 in ldap_pvt_runqueue_resched (rq=0x16c630, entry=0xee6c0a0, > defer=0) at rq.c:165 > #5 0xfe7f4a94 in syncprov_qstart (so=0x10acb540) at syncprov.c:933 > #6 0xfe7f4d6c in syncprov_qresp (opc=0x1b1bfaf8, so=0x10acb540, mode=2) at > syncprov.c:982 > #7 0xfe7f5aa4 in syncprov_matchops (op=0xf6bffa50, opc=0x1b1bfaf8, saveit=0) at > syncprov.c:1175 > #8 0xfe7f7490 in syncprov_op_response (op=0xf6bffa50, rs=0xf6bff644) at > syncprov.c:1561 > #9 0x000575cc in ?? () > #10 0x000575cc in ?? () > 8<------------------------------------------------------------- > > Thanks in advance for any pointers! > OK, I've spent some more time trying to understand this part of syncprov.c. From what I understand : - the assert failure in ldap_pvt_runqueue_resched is caused by the fact syncprov_qstart is trying to "reschedule" a task that is no longer in the task_list - the only time the task is removed from the task_list (via ldap_pvt_runqueue_remove) is when the task is being run, in syncprov_qtask, if syncprov_qplay returns !=0 - the next time syncprov_qstart is called, it finds "so->s_qtask" is not NULL, and tries to reschedule the task, but it's no longer in the task_list. I've written a patch that sets "so->s_qtask" to NULL in syncprov_qtask, just after removing the task from the task_list. So that when syncprov_qstart is called again, it goes into ldap_pvt_runqueue_insert... The patch is attached. Unfortunately, I can't confirm it fixes the bug since I can't reproduce it... For those who understand the logic behind this, does this make any sense? :) Regards, Jonathan -- -------------------------------------------------------------- Jonathan Clarke - jonathan(a)phillipoux.net -------------------------------------------------------------- Ldap Synchronization Connector (LSC) - http://lsc-project.org -------------------------------------------------------------- --------------080204020206070201080405 Content-Type: text/x-patch; name="patch-syncprov-20090602.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="patch-syncprov-20090602.patch" Index: servers/slapd/overlays/syncprov.c =================================================================== RCS file: /repo/OpenLDAP/pkg/ldap/servers/slapd/overlays/syncprov.c,v retrieving revision 1.56.2.51 diff -u -p -r1.56.2.51 syncprov.c --- servers/slapd/overlays/syncprov.c 9 Jul 2008 20:53:13 -0000 1.56.2.51 +++ servers/slapd/overlays/syncprov.c 2 Jun 2009 15:57:21 -0000 @@ -908,6 +908,7 @@ syncprov_qtask( void *ctx, void *arg ) } else { /* bail out on any error */ ldap_pvt_runqueue_remove( &slapd_rq, rtask ); + if ( so ) so->s_qtask = NULL; } ldap_pvt_thread_mutex_unlock( &slapd_rq.rq_mutex ); --------------080204020206070201080405--

1 0

Re: (ITS#6153) Segfault during Heimdal's kadmin -l init Realm
by hyc＠symas.com 02 Jun '09

02 Jun '09

dewayne_freebsd(a)yahoo.com wrote: > Full_Name: Dewayne Geraghty > Version: 2.4.16 > OS: FreeBSD 7.2R > URL: http://www.consciuminternational.com.au/ldap > Submission from: (NULL) (58.172.112.108) > > > FreeBSD version 7.2; Heimdal V1.2.1; OpenLDAP 2.4.16 > Heimdal and OpenLDAP are built for heimdal to use OpenLDAP as backend. > Segmentation fault during > kadmin -l > init HS Use Heimdal 1.2.2. https://roundup.it.su.se/jira/browse/HEIMDAL-220 > > slapd and heimdal work correctly, independently. > > slapd is running at debug 1019, logs are at enclosed URL along with the full gdb > trace, and configuration files. If I can assist please advise. > > This is a single Pentium CPU, and gcc flags > CFLAGS= -pipe -g3 -ggdb3 -O0 -march=pentium4 -mtune=pentium4 -DDO_KRB5 > -DDO_SAMBA -DHAVE_OPENSSL > > #0 0x286447b6 in memmove () from /lib/libc.so.7 > #1 0x282a10e8 in ber_write () from /usr/local/lib/liblber-2.4.so.6 > #2 0x2829ebf7 in ber_put_ostring () from /usr/local/lib/liblber-2.4.so.6 > #3 0x2829ed14 in ber_put_berval () from /usr/local/lib/liblber-2.4.so.6 > #4 0x2829faca in ber_printf () from /usr/local/lib/liblber-2.4.so.6 > #5 0x2821a0ee in ldap_add_ext () from /usr/local/lib/libldap-2.4.so.6 > #6 0x2821a378 in ldap_add_ext_s () from /usr/local/lib/libldap-2.4.so.6 > #7 0x280ad493 in LDAP_store (context=0x287050b0, db=0x2870e040, flags=0, > entry=0xbfbfe790) at hdb-ldap.c:1600 > #8 0x2809875b in kadm5_s_create_principal (server_handle=0x2871b0c0, > princ=0xbfbfea3c, mask=17, password=0xbfbfe830 "bQdxg9drKf") > at create_s.c:182 > #9 0x2808da1c in kadm5_create_principal (server_handle=0x2871b0c0, > princ=0xbfbfea3c, mask=17, password=0xbfbfe830 "bQdxg9drKf") > at common_glue.c:64 > #10 0x0804e496 in ?? () > #11 0x2871b0c0 in ?? () > #12 0xbfbfea3c in ?? () > #13 0x00000011 in ?? () > #14 0xbfbfe830 in ?? () > #15 0x28084000 in ?? () > #16 0x28084200 in ?? () > #17 0x28084400 in ?? () > #18 0x285b3c8d in _pthread_mutex_init_calloc_cb () from /lib/libc.so.7 > #19 0x0804e8c9 in ?? () > #20 0x2870c0e0 in ?? () > #21 0x00000000 in ?? () > #22 0x00000000 in ?? () > #23 0x00000000 in ?? () > #24 0x2870c085 in ?? () > #25 0x00000000 in ?? () > #26 0x28650030 in ?? () from /lib/libc.so.7 > > I have spent weeks trying to get this to work. (Because I'm using and modifing > the FreeBSD ports system to build and use the latest version of LDAP and > Heimdal.) > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

(ITS#6158) syncprov: assert causing slapd to core dump
by jonathan＠phillipoux.net 02 Jun '09

02 Jun '09

Full_Name: Jonathan Clarke Version: 2.3.43 OS: Solaris URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (213.41.243.192) Hi, I have a 2.3.43 running on a Solaris Sparc server, which crashes occasionally - once every week or two, always during the night. At this particular time a large number of operations are performed, including mass deletes and adds. I haven't been able to reproduce this bug, just watch it happen on the production server every now and again... I managed to obtain a coredump, and a backtrace (at the end of this message). I realize this isn't much to go on, but I'm rather unfamiliar with this part of the code, so I wondered if anyone has an idea what's going on here? FWIW, the dynlist and chain overlays are in use on the server, and the database is bdb, with a syncrepl consumer as well as syncprov overlay. Backtrace follows: 8<------------------------------------------------------------- Thread 1 (process 1054014 ): #0 0xfee4aa58 in _lwp_kill () from /lib/libc.so.1 #1 0xfede5a64 in raise () from /lib/libc.so.1 #2 0xfedc1954 in abort () from /lib/libc.so.1 #3 0xfedc1b90 in _assert () from /lib/libc.so.1 #4 0xff30ef44 in ldap_pvt_runqueue_resched (rq=0x16c630, entry=0xee6c0a0, defer=0) at rq.c:165 #5 0xfe7f4a94 in syncprov_qstart (so=0x10acb540) at syncprov.c:933 #6 0xfe7f4d6c in syncprov_qresp (opc=0x1b1bfaf8, so=0x10acb540, mode=2) at syncprov.c:982 #7 0xfe7f5aa4 in syncprov_matchops (op=0xf6bffa50, opc=0x1b1bfaf8, saveit=0) at syncprov.c:1175 #8 0xfe7f7490 in syncprov_op_response (op=0xf6bffa50, rs=0xf6bff644) at syncprov.c:1561 #9 0x000575cc in ?? () #10 0x000575cc in ?? () 8<------------------------------------------------------------- Thanks in advance for any pointers! Regards, Jonathan

1 0

Re: (ITS#6133) back-relay issues
by h.b.furuseth＠usit.uio.no 01 Jun '09

01 Jun '09

Questions: * relay_back_operational() sets up callbacks. Should it? Looks harmless, but as far as I can tell, be->be_operational() functions do not use them, since they (should) send no response. * There is no relay_back_chk_controls(). Should there be? Though I think DNs would then be rewritten four times the same way for each operation:-( Already operational, has_subordinates and finally the operation itself does. And possibly for access controls. I've factored op.c code out to table-driven handlers and a macro, and cleaned away those '#if 0's. Fixed more problems: * Search referrals should have a scope. * relay_back_op_extended() was (still) broken. The handler should return a result which caller should send, so it must set sr->sr_ref without freeing it. Setting REP_REF_MUSTBEFREED instead, and droping the RB_SEND requirement in fail_mode. * For readability, fixed return values from relay_back_chk_referrals() and other unused handlers. (chk_referrals may be unfixable.) * relay_back_entry_<get/release>_rw() returned operationsError for failure. Failing with noSuchObject/unwillingToPerform instead. * relay_back_entry_release_rw() leaked entries when bd->be_release==0. For paranoia, fixed it only when the entry's e_private == NULL. > * The handlers for Abandon, Cancel and connection-init/destroy > should not exist, as far as I can tell. They forward the call to the > underlying backend database, but that means it receives the call twice: So does Unbind. Removed the handler. > * back-relay can be configured to cause infinite recursion. (...) > Anyway, recursion can now be properly caught with op->o_extra. (...) Needed a unique key per <operation type, relay database> combination. Otherwise things like backend_group() called from another operation failed when looking up a relayed DN via relay_back_entry_get_rw(). That fixed relay_back_operational() and relay_back_has_subordinates(), or at least I assume that's what their FIXMEs were about. -- Hallvard

1 0

Re: (ITS#6131) "TLSVerifyClient try" not working with GNU TLS
by subbarao＠computer.org 01 Jun '09

01 Jun '09

Howard Chu wrote: >> I was just looking around for a possible explanation to the problem that >> I'm encountering. >> >> I double-checked the version that I was running and it's actually >> 2.4.15, not 2.4.16. Would there be a significant difference between >> these two versions with respect to TLS certificate handling? > > Yes. Read the 2.4.16 CHANGES. Ok, I see the following: Fixed libldap GnuTLS TLSVerifyCilent try (ITS#5981) Looking at ITS #5981, that seems to be exactly the same problem that I'm having. I tried searching the openldap.org site before for similar keywords, but I guess I missed this. Thanks, -Kartik

1 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

openldap-bugs June 2009