openldap-bugs May 2008

openldap-bugs@openldap.org

54 participants
208 discussions

(ITS#5496) delta-syncrepl consumers don't update backend contextCSN after processing each entry
by jwm＠horde.net 05 May '08

05 May '08

Full_Name: John Morrissey Version: 2.3.41 OS: Linux URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (66.133.190.110) delta-syncrepl does not update the backend's contextCSN after processing each replication event. This exposes slapd to problems in the event of a power loss, crash, etc. as it will attempt to re-process the replication entries once re-started. Relevant thread on openldap-software starts here: http://marc.info/?l=openldap-software&m=120965853100883&w=2 and Howard's summary of the behavior is here: http://marc.info/?l=openldap-software&m=120967526528453&w=2

1 0

Re: (ITS#5470) Sporadic failures with RE24
by raphael.ouazana＠linagora.com 05 May '08

05 May '08

Hi, Le Ven 2 mai 2008 11:01, hyc(a)symas.com a écrit : > luca(a)OpenLDAP.org wrote: >> luca(a)OpenLDAP.org wrote: >>> This is a multi-part message in MIME format. >>> --------------080809000906010300090306 >>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>> Content-Transfer-Encoding: 7bit >>> >>> Howard Chu wrote: >>> >>>> Thanks. Please try HEAD again. >>>> >>> No way. >>> new testrun directory in >>> ftp://ftp.sys-net.it/luca_scamoni_its5470_20080430-new.tgz >>> >>> backtrace attached >>> >> recent commits seem to have fixed it (at least, right now I'm not able >> to reproduce it anymore...) > > Right. Confirmed here too; I (temporarily) added an assert(0) to the > offending > branch of code to make sure the patch was actually getting hit. It takes a > very particular timing to trigger that code path. > > I'm not sure how we can reliably test for this down the road. Perhaps we > should add a "disabled" config keyword for backends and syncrepl > consumers, so > that we can start up the individual servers, (which takes an unpredictable > amount of time for each) and then enable various parts in a fixed sequence > (e.g. 1 second sleeps between ldapmodify/enable requests). Even that's hit > or > miss, because our test database is so small it's unlikely that we can hit > the > window of time on demand. I'm testing the last RE24 tag. After 201 successful runs of test050, I got a failure :/ Cleaning up test run directory leftover from previous run. Running ./scripts/test050-syncrepl-multimaster... running defines.sh Initializing server configurations... Starting producer slapd on TCP/IP port 9011... Using ldapsearch to check that producer slapd is running... Inserting syncprov overlay on producer... Starting consumer slapd on TCP/IP port 9012... Using ldapsearch to check that consumer slapd is running... Configuring syncrepl on consumer... Starting consumer2 slapd on TCP/IP port 9013... Using ldapsearch to check that consumer2 slapd is running... Configuring syncrepl on consumer2... Adding schema and databases on producer... Using ldapadd to populate producer... Waiting 20 seconds for syncrepl to receive changes... Using ldapadd to populate consumer... Waiting 20 seconds for syncrepl to receive changes... Using ldapsearch to check that syncrepl received database changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... ldapsearch failed (32)! testrun uploaded in ftp://ftp.openldap.org/incoming/raphael-ouazana-testrun-080505.tgz Regards, Raphaël Ouazana.

1 0

(ITS#5495) cpu consuption
by naga.gangadhar＠gmail.com 04 May '08

04 May '08

Full_Name: Naga Gangadhar Reddy Version: 2.4.19 OS: cent-os URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (203.212.208.159) slapd is taking 100% cpu though the memory consumption is just 0.9%.

1 0

Re: (ITS#5488) syncrepl received contextCSN not passed on to syncprov consumers
by hyc＠symas.com 04 May '08

04 May '08

Rein Tollevik wrote: > On Wed, 30 Apr 2008, Howard Chu wrote: >> rein(a)OpenLDAP.org wrote: >>> My first attempt at fixing this was to change syncprov to fetch the >>> queued csn values from the glue backend where it was used. But that >>> failed as other modules queues the csn values in their own backend when >>> they changes things. >> What other modules? Generally there cannot be any other sources of changes. > > Sorry, I should have written other configurations. The CSNs gets queued > in the subordinate database when syncrepl is used there, or not at all > (i.e in regular updates that comes in through the frontend). OK, but that's again quite a special case. I.e., that's multi-master; in the default (single-master) case there cannot be regular updates arriving through the frontend. When a single-master syncrepl consumer is configured, that is the only possible source of updates. Let's be sure we've solved this question for the single-master case first, before addressing the multi-master case. While it's expected that the software will be able to handle multiple glued DBs and multi-master across them, I seriously doubt that anyone out there actually knows how to configure and maintain such a setup yet. >>> Instead I changed ctxcsn.c so that it always >>> queues them in the glue backend where syncprov is used. But I don't >>> feel that my understanding of this stuff is good enough to be sure that >>> this is the optimal solution.. >> I definitely don't like references to the syncprov overlay appearing in main >> slapd code like that. We need a different solution. > To me it makes sense to have a single queue of CSN values in a glued > configuration, no matter if or where syncprov is used. Yes, I can probably go along with that. The downside is that it may reduce write concurrency a bit, compared to a glued configuration where each glued DB is otherwise independent. > Another approach could be to have syncprov look in the glue database if > it fails to find any queued CSN in a subordinate db. I haven't tested > it, but that should work in both configurations. It should also remove > the need to always look for the glue db which my patch requires. Would > that be better? That sounds like a decent alternative. >>> Btw, in syncprov_checkpoint() there is a similar SLAP_GLUE_SUBORDINATE >>> test, should that have included an overlay_is_inst() clause as well? >> Perhaps. You would have to use op->o_bd->bd_self instead of op->o_bd on >> that call. > The current test (introduced to fix ITS#5433) causes the contextCSN to > be written to the glue database when syncprov is used on a subordinate > db, which appears wrong to me. Understood. Again, the question is whether the admin intended to configure a single syncprov over an entire glued DB, or individual syncprovs over each component of the glued tree. The distinction is vital, and it's detected based on whether the syncprov overlay is above the glue overlay in the overlay stack, or below it, on the topmost DB. > Could you elaborate on when op->o_bd->bd_self must be used instead of > op->o_bd? I understand that op->o_bd may be a copy of the original > structure that op->o_bd->bd_self refers to, but I'm not sure when it > must be used. Btw, could op->o_bd->bd_self->bd_info be used to fetch > the BackendInfo that can be used to call the top-most bd_search (and > similar) also in overlays? If you read the code for overlay_is_inst() it should be obvious - that function only works when used with a real BackendDB structure. The local copy structure has had its bd_info replaced with whatever on_inst structure corresponds to the current overlay. Yes, the bd_self points to the topmost structure, so you can use it for be_search. Much of what's happening in these overlays was intended to avoid starting over at the top though, because the code is already running in the desired overlay context. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#5487) syncprov_findbase must search the backend from the syncrepl search
by hyc＠symas.com 04 May '08

04 May '08

Rein Tollevik wrote: > On Wed, 30 Apr 2008, Howard Chu wrote: > >> rein(a)OpenLDAP.org wrote: > >>> syncprov_findbase() must search the backend saved with the syncrepl >>> operation, >>> not the one from the operation passed as argument. The backend in the op >>> argument can be a subordinate database, in which case the search for the >>> base in >>> the superior database will fail, and syncrepl consumers will be force to do >>> a an >>> unneccessary full refresh of the database. >> OK. >> >>> The patch at the end should fix >>> this. Note that both fop.o_bd and fop.o_bd->bd_info can be changed by the >>> overlay_op_walk() call, which is the reason for the long pointer traversal >>> to >>> find the correct bd_info to save and restore. > >> But the overlay_op_walk call is only appropriate when the DB to be searched >> is the current database, and the current DB is an overlay DB structure. > > Ah, the changing of the BackendDB->bd_info that takes place when > overlays are called feels like an open pit I manage to fall into every > time I get close to it... I wish it could be replaced in a future > version. Agreed, it would have been safer as an Op-specific field, but that would have caused quite a lot of disruption to all existing backend code. > A new patch that I hope should fix this is at the end. It always use > be_search, after putting back the original bd_info if needed. I feel > that using the generic be_search is better than interfering directly > with the overlay code as overlay_op_walk does. I also tested for > SLAP_ISOVERLAY rather than PS_IS_REFRESHING, as that appeared more > generic to me. But again, I may be totally wrong here. Does this patch > look better? SLAP_IS_OVERLAY will never be true here. That flag is only set when the BackendDB being tested is a local copy of a real BackendDB structure. The structure referenced in s_op is always a real BackendDB. In fact, if you're always going to use s_op and be_search, there's no further work needed, because the regular overlay infrastructure will always make a new local BackendDB copy itself. (And of course, some of that would be wasted effort, which is why the original code uses overlay_op_walk. Since op->o_bd is already an overlay DB, there's no need to make yet another copy for the first-search case.) -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#5488) syncrepl received contextCSN not passed on to syncprov consumers
by rein＠OpenLDAP.org 04 May '08

04 May '08

On Wed, 30 Apr 2008, Howard Chu wrote: > rein(a)OpenLDAP.org wrote: >> When syncrepl and syncprov are both used on a glue database, the >> contextCSN received from the syncrepl producers are not passed on to the >> syncprov consumers when changes in subordinate databases are received. >> The reason is that syncrepl queues the CSNs in the glue backend, while >> syncprov fetches them from the backend where the changes are made. As a >> consequence, the consumers will be passed a cookie without any csn >> value. >> >> My first attempt at fixing this was to change syncprov to fetch the >> queued csn values from the glue backend where it was used. But that >> failed as other modules queues the csn values in their own backend when >> they changes things. > > What other modules? Generally there cannot be any other sources of changes. Sorry, I should have written other configurations. The CSNs gets queued in the subordinate database when syncrepl is used there, or not at all (i.e in regular updates that comes in through the frontend). >> Instead I changed ctxcsn.c so that it always >> queues them in the glue backend where syncprov is used. But I don't >> feel that my understanding of this stuff is good enough to be sure that >> this is the optimal solution.. > > I definitely don't like references to the syncprov overlay appearing in main > slapd code like that. We need a different solution. That's reasonable, but the test for syncrepl is probably not needed if this solution should be kept. The test was more or less a copy and paste from syncrepl where it finds out which backend to write through. To me it makes sense to have a single queue of CSN values in a glued configuration, no matter if or where syncprov is used. > At one point in the past, I had changed syncrepl.c to queue the CSNs in > both places, but that seemed rather sloppy. Still, it may work best here. I don't like duplicating information, sooner or later it tends to end up with wrong info in one of the places.. Another approach could be to have syncprov look in the glue database if it fails to find any queued CSN in a subordinate db. I haven't tested it, but that should work in both configurations. It should also remove the need to always look for the glue db which my patch requires. Would that be better? >> Btw, in syncprov_checkpoint() there is a similar SLAP_GLUE_SUBORDINATE >> test, should that have included an overlay_is_inst() clause as well? > > Perhaps. You would have to use op->o_bd->bd_self instead of op->o_bd on > that call. The current test (introduced to fix ITS#5433) causes the contextCSN to be written to the glue database when syncprov is used on a subordinate db, which appears wrong to me. Could you elaborate on when op->o_bd->bd_self must be used instead of op->o_bd? I understand that op->o_bd may be a copy of the original structure that op->o_bd->bd_self refers to, but I'm not sure when it must be used. Btw, could op->o_bd->bd_self->bd_info be used to fetch the BackendInfo that can be used to call the top-most bd_search (and similar) also in overlays? Rein

1 0

(ITS#5494) slapd crashed when accessed by multiple threads
by adejong＠debian.org 04 May '08

04 May '08

Full_Name: Arthur de Jong Version: 2.4.7 OS: Debian unstable URL: http://arthurenhella.demon.nl/nss-ldapd/adejong-slapd-crash.log Submission from: (NULL) (83.160.165.27) This has also been submitted as a Debian bug: http://bugs.debian.org/479237 My test slapd consistently crashes when doing multiple simultaneous requests in different threads. Each thread has it's own LDAP *ld connection to the LDAP server which is supposed to be supported [1]. In any case this shouldn't crash the LDAP server. [1] http://www.openldap.org/lists/openldap-software/200606/msg00252.html This problem arises in my test suite for nss-ldapd. Source can be checked out at http://arthurenhella.demon.nl/svn/nss-ldapd/ (svn) and the test file is (test/test_myldap.c). It uses a wrapper module (myldap) around calls to OpenLDAP to simplify memory management. The function that triggers the crash is test_threads(). I have captured the crash in gdb: # gdb /usr/sbin/slapd GNU gdb 6.8-debian [...] This GDB was configured as "i486-linux-gnu"... (gdb) r -d 1 -h ldap:/// ldaps:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf Starting program: /usr/sbin/slapd -d 1 -h ldap:/// ldaps:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf [Thread debugging using libthread_db enabled] [New Thread 0xb7b3a930 (LWP 1542)] @(#) $OpenLDAP: slapd 2.4.7 (Apr 16 2008 08:13:31) $ @minerva.hungry.com:/home/pere/src/debiancvs/initscripts-ng-svn/trunk/src/insserv/openldap2.3-2.4.7/debian/build/servers/slapd ldap_pvt_gethostbyname_a: host=sorbet, r=0 daemon_init: listen on ldap:/// daemon_init: 1 listeners to open... [...] <= send_search_entry: conn 2 exit. entry_decode: "cn=Zaka Eddins+uid=zeddins,ou=lotsofpeople,dc=test,dc=tld" <= entry_decode(cn=Zaka Eddins+uid=zeddins,ou=lotsofpeople,dc=test,dc=tld) => send_search_entry: conn 2 dn="cn=Zaka Eddins+uid=zeddins,ou=lotsofpeople,dc=test,dc=tld" ber_flush2: 107 bytes to sd 18 <= send_search_entry: conn 2 exit. entry_decode: "uid=wvakil,ou=lotsofpeople,dc=test,dc=tld" <= entry_decode(uid=wvakil,ou=lotsofpeople,dc=test,dc=tld) => send_search_entry: conn 2 dn="uid=wvakil,ou=lotsofpeople,dc=test,dc=tld" ber_flush2: 90 bytes to sd 18 <= send_search_entry: conn 2 exit. entry_decode: "uid=zmeeker,ou=lotsofpeople,dc=test,dc=tld" <= entry_decode(uid=zmeeker,ou=lotsofpeople,dc=test,dc=tld) => send_search_entry: conn 2 dn="uid=zmeeker,ou=lotsofpeople,dc=test,dc=tld" ber_flush2: 92 bytes to sd 18 <= send_search_entry: conn 2 exit. bdb_search: 1104 scope not okay Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb5f18b90 (LWP 5017)] 0xb7cef160 in pthread_mutex_lock () from /lib/libpthread.so.0 (gdb) bt #0 0xb7cef160 in pthread_mutex_lock () from /lib/libpthread.so.0 #1 0xb7f4351d in ldap_pvt_thread_mutex_lock () from /usr/lib/libldap_r-2.4.so.2 #2 0xb783883d in bdb_cache_return_entry_rw (bdb=0x81ea358, e=0x820922c, rw=0, lock=0xb5f16fd4) at /home/pere/src/debiancvs/initscripts-ng-svn/trunk/src/insserv/openldap2.3-2.4.7/servers/slapd/back-bdb/cache.c:256 #3 0xb782ce12 in bdb_search (op=0x8299b10, rs=0xb5f18168) at /home/pere/src/debiancvs/initscripts-ng-svn/trunk/src/insserv/openldap2.3-2.4.7/servers/slapd/back-bdb/search.c:909 #4 0x08077d13 in fe_op_search (op=0x8299b10, rs=0xb5f18168) at /home/pere/src/debiancvs/initscripts-ng-svn/trunk/src/insserv/openldap2.3-2.4.7/servers/slapd/search.c:368 #5 0x0807853c in do_search (op=0x8299b10, rs=0xb5f18168) at /home/pere/src/debiancvs/initscripts-ng-svn/trunk/src/insserv/openldap2.3-2.4.7/servers/slapd/search.c:217 #6 0x080757c6 in connection_operation (ctx=0xb5f18248, arg_v=0x8299b10) at /home/pere/src/debiancvs/initscripts-ng-svn/trunk/src/insserv/openldap2.3-2.4.7/servers/slapd/connection.c:1083 #7 0x08075ed6 in connection_read_thread (ctx=0xb5f18248, argv=0x13) at /home/pere/src/debiancvs/initscripts-ng-svn/trunk/src/insserv/openldap2.3-2.4.7/servers/slapd/connection.c:1210 #8 0xb7f42a44 in ?? () from /usr/lib/libldap_r-2.4.so.2 #9 0xb5f18248 in ?? () #10 0x00000013 in ?? () #11 0x00000000 in ?? () A more detailed backtrace is available at the url specified below.

1 0

Re: (ITS#5487) syncprov_findbase must search the backend from the syncrepl search
by rein＠OpenLDAP.org 04 May '08

04 May '08

On Wed, 30 Apr 2008, Howard Chu wrote: > rein(a)OpenLDAP.org wrote: >> syncprov_findbase() must search the backend saved with the syncrepl >> operation, >> not the one from the operation passed as argument. The backend in the op >> argument can be a subordinate database, in which case the search for the >> base in >> the superior database will fail, and syncrepl consumers will be force to do >> a an >> unneccessary full refresh of the database. > > OK. > >> The patch at the end should fix >> this. Note that both fop.o_bd and fop.o_bd->bd_info can be changed by the >> overlay_op_walk() call, which is the reason for the long pointer traversal >> to >> find the correct bd_info to save and restore. > But the overlay_op_walk call is only appropriate when the DB to be searched > is the current database, and the current DB is an overlay DB structure. Ah, the changing of the BackendDB->bd_info that takes place when overlays are called feels like an open pit I manage to fall into every time I get close to it... I wish it could be replaced in a future version. > Your patch causes fc->fss->s_op->o_bd's bd_info pointer to change, which is > not allowed. That's in the original backendDB, which must be treated as > read-only since multiple threads may be accessing it. The correct approach > here is to use a new local backendDB variable, copy the s_op->o_bd into it, > and then just do a regular be_search invocation instead of using > overlay_op_walk. > > But, this patch must not take effect on the first call to syncprov_findbase > (which occurred in syncprov_op_search) - in that case, the current code is > correct. So, you need to tweak things based on whether (s_flags & > PS_IS_REFRESHING) is true or not - if true, this is the first search, and it > should use the original code. Else, it must use be_search. A new patch that I hope should fix this is at the end. It always use be_search, after putting back the original bd_info if needed. I feel that using the generic be_search is better than interfering directly with the overlay code as overlay_op_walk does. I also tested for SLAP_ISOVERLAY rather than PS_IS_REFRESHING, as that appeared more generic to me. But again, I may be totally wrong here. Does this patch look better? Rein Index: OpenLDAP/servers/slapd/overlays/syncprov.c =================================================================== RCS file: /f/CVSROOT/drift/OpenLDAP/servers/slapd/overlays/syncprov.c,v retrieving revision 1.1.1.18 diff -u -u -r1.1.1.18 syncprov.c --- OpenLDAP/servers/slapd/overlays/syncprov.c 30 Apr 2008 11:17:58 -0000 1.1.1.18 +++ OpenLDAP/servers/slapd/overlays/syncprov.c 2 May 2008 11:19:46 -0000 @@ -404,7 +404,7 @@ slap_callback cb = {0}; Operation fop; SlapReply frs = { REP_RESULT }; - BackendInfo *bi; + BackendDB be; int rc; fc->fss->s_flags ^= PS_FIND_BASE; @@ -413,10 +413,15 @@ fop = *fc->fss->s_op; fop.o_hdr = op->o_hdr; - fop.o_bd = op->o_bd; fop.o_time = op->o_time; fop.o_tincr = op->o_tincr; - bi = op->o_bd->bd_info; + + if ( SLAP_ISOVERLAY( fop.o_bd )) { + slap_overinst *on = (slap_overinst *)fop.o_bd->bd_info; + be = *fop.o_bd; + be.bd_info = (BackendInfo *)on->on_info; + fop.o_bd = &be; + } cb.sc_response = findbase_cb; cb.sc_private = fc; @@ -434,8 +439,7 @@ fop.ors_filter = &generic_filter; fop.ors_filterstr = generic_filterstr; - rc = overlay_op_walk( &fop, &frs, op_search, on->on_info, on ); - op->o_bd->bd_info = bi; + rc = fop.o_bd->be_search( &fop, &frs ); } else { ldap_pvt_thread_mutex_unlock( &fc->fss->s_mutex ); fc->fbase = 1;

1 0

(ITS#5493) CSN updates with delta-syncrepl refresh
by quanah＠OpenLDAP.org 03 May '08

03 May '08

Full_Name: Quanah Gibson-Mount Version: 2.4/HEAD OS: URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (24.23.156.219) When delta-syncrepl is refreshing from the accesslog DB, it should update its local CSN after every modification, rather than waiting until all updates have propagated and then updating its CSN, in case slapd is shut down or dies, etc, while in the update process. This comes from the thread: <http://www.openldap.org/lists/openldap-software/200805/msg00004.html> --Quanah

1 0

back-bdb cache wrong lock ordering
by Yin Wang 03 May '08

03 May '08

Our study shows another potential deadlock in back-bdb/cache.c OpenLDAP 2.4.8 In bdb_cache_release_all(), thread wants to lock the c_lru_mutex at cache.c:1366, while it has the c_rwlock from cache.c:1364 In bdb_cache_delete_internal(), thread wants to lock the c_rwlock at cache.c:1317, while it has the c_lru_mutex from callers, bdb_cache_delete() at cache.c:1259, or bdb_cache_lru_purge() at cache.c:652. Note that two deadlock bugs of the same two mutexes have been reported and verified in the issue tracking system http://www.openldap.org/its/index.cgi/Software%20Bugs?id=4254 http://www.openldap.org/its/index.cgi/Software%20Bugs?id=3494 Yin Wang

2 1

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

openldap-bugs May 2008