openldap-bugs March 2008

openldap-bugs@openldap.org

34 participants
162 discussions

Re: (ITS#5437) Fix to servers/slapd/overlays/syncprov.c
by hyc＠symas.com 21 Mar '08

21 Mar '08

c.hoover(a)its.utexas.edu wrote: > Full_Name: Clyde Hoover > Version: 2.4.8 > OS: Solaris 10 SPARC > URL: > Submission from: (NULL) (128.83.93.7) > > > The bug is around line 2015 of servers/slapd/overlays/syncprov.c > > } else if ( rs->sr_type == REP_RESULT&& rs->sr_err == LDAP_SUCCESS ) { > struct berval cookie; > > // The cookie is never initialized so if SS_CHANGED is not set > // this can cause an internal assertion failure downstream from > // syncprov_done_ctrl() (depending upon what is on the stack) > // Nulling out the cookie fixes this > memset((void *)&cookie, 0, sizeof(cookie)); // Add this > if ( ss->ss_flags& SS_CHANGED ) { > slap_compose_sync_cookie( op,&cookie, ss->ss_ctxcsn, > srs->sr_state.rid, > srs->sr_state.srs->sr_state.sid ); Please provide a stack trace from such a failure. The syncprov_done_ctrl() function should never be invoked at that point without SS_CHANGED being set, since in a normal Refresh the search will hit the shortcut and never reach that code. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

(ITS#5437) Fix to servers/slapd/overlays/syncprov.c
by c.hoover＠its.utexas.edu 21 Mar '08

21 Mar '08

Full_Name: Clyde Hoover Version: 2.4.8 OS: Solaris 10 SPARC URL: Submission from: (NULL) (128.83.93.7) The bug is around line 2015 of servers/slapd/overlays/syncprov.c } else if ( rs->sr_type == REP_RESULT && rs->sr_err == LDAP_SUCCESS ) { struct berval cookie; // The cookie is never initialized so if SS_CHANGED is not set // this can cause an internal assertion failure downstream from // syncprov_done_ctrl() (depending upon what is on the stack) // Nulling out the cookie fixes this memset((void *)&cookie, 0, sizeof(cookie)); // Add this if ( ss->ss_flags & SS_CHANGED ) { slap_compose_sync_cookie( op, &cookie, ss->ss_ctxcsn, srs->sr_state.rid, srs->sr_state.srs->sr_state.sid );

1 0

Re: (ITS#5432) syncrepl seg. faults if received a sync cookie with an empty csn= value
by hyc＠symas.com 21 Mar '08

21 Mar '08

Rein Tollevik wrote: > On Wed, 19 Mar 2008, Howard Chu wrote: > >>> syncprov_playlog() in syncprov.c generates such invalid cookies when >>> replaying a >>> sessionlog where no entries has been deleted. The patch also fixes this >>> bug. >> This should never happen, since the cookie is not generated if there were no >> deletes recorded. > > Well, it do happen anyhow. Using the 2.4.8 release this can easily be > reproduced with a configuration where the syncrepl provider (with > sessionlogs enabled) is on a glue backend, the consumer replicates the > entire glue subtree. Assuming both servers are in sync and stopped, > start the producer, modify an entry in the subordinate db and then start > the consumer. The producer will send a cookie with an empty csn= value > as in this sync debug output: > slapd starting > slap_queue_csn: queing 0x418001a0 20080320125116.487494Z#000000#000#000000 > slap_graduate_commit_csn: removing 0x8d6ae0 20080320125116.487494Z#000000#000#000000 > srs csn 20080320124421.642686Z#000000#000#000000 > log csn 20080320125116.487494Z#000000#000#000000 > syncprov_playlog: cookie=rid=007,csn= > syncprov_search_response: cookie=rid=007,csn=20080320125116.487494Z#000000#000#000000 > > I have also tested it using the cvs head source, and the good news is > that this was accidentally fixed (at least partly) by the ITS#5434 > change. I.e it was caused by the be_search() in syncprov_playlog() not > being propagated to the subordinate database and thereby not finding the > entry that was modified. > > The bad news is that there is still a race condition that would have the > same effect. If someone deletes the entry that was modified just before > be_search() is done it will return the same empty cookie. I was able to > reproduce this in a debugger by setting a break point before be_search(), > deleting the entry with ldapdelete and in the debugger run next over the > be_search() call before continuing. > > So, I still think it is best to keep delcsn[0].bv_val NULL until an > actual cookie has been written to it, as slap_compose_sync_cookie() will > create a cookie with an empty csn= if passed a csn array where bv_len is > zero but bv_val is non-NULL. Although an alternative and maybe more > correct fix could be to change slap_compose_sync_cookie() instead? No. Thanks for your explanation of what you observed. The actual bug is in the mods processing, when it decides that a modified entry should be treated as a delete instead. It increments ndel then but doesn't set delcsn in that case. That's what actually needs to be fixed. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: Segfault in slapd 2.4.8 - syncrepl and glue (ITS#5430)
by ghenry＠suretecsystems.com 20 Mar '08

20 Mar '08

<quote who="hyc(a)symas.com"> > Duncan Gibb wrote: >> Howard Chu wrote: >> HYC> Most notably, in template 1, you explicitly configured >> HYC> syncprov on top of glue instead of taking the default >> HYC> behavior of syncprov under glue. As documented in >> HYC> slapd.conf(5), you should only do that when you want a >> HYC> single syncprov overlay to be master of the entire tree. >> HYC> In this case, you clearly want syncprov to only master >> HYC> one portion of the tree, with the subordinate being >> HYC> taken care of by its own syncprov overlay. >> >> That's a legacy of several previous iterations of the real project. >> Ideally each box would have at most two backends - one holding >> everything for which it's a consumer, and on the relevant machines a >> second backend holding the data for which it's master, then glue, then a >> single syncprov making the whole tree available. My colleague who >> implemented the first iteration did it that way and it should have >> worked, but the syncprov-glue interactions in 2.3.x were such that that >> resulted in sections of the tree not replicating and/or disappearing in >> various ill-defined situations. > > The whole-tree approach may have some snags. > 1) it's intended to work with a single syncprov over the entire glued > tree > 2) it requires a unique serverID for each master > 3) it probably won't work to start from a completely empty DB - the > root > entry needs to exist on each server. It's probably worth while for us and others to start a new thread on -software or -technical? Gavin.

1 0

Re: Segfault in slapd 2.4.8 - syncrepl and glue (ITS#5430)
by hyc＠symas.com 20 Mar '08

20 Mar '08

Duncan Gibb wrote: > Howard Chu wrote: > HYC> Most notably, in template 1, you explicitly configured > HYC> syncprov on top of glue instead of taking the default > HYC> behavior of syncprov under glue. As documented in > HYC> slapd.conf(5), you should only do that when you want a > HYC> single syncprov overlay to be master of the entire tree. > HYC> In this case, you clearly want syncprov to only master > HYC> one portion of the tree, with the subordinate being > HYC> taken care of by its own syncprov overlay. > > That's a legacy of several previous iterations of the real project. > Ideally each box would have at most two backends - one holding > everything for which it's a consumer, and on the relevant machines a > second backend holding the data for which it's master, then glue, then a > single syncprov making the whole tree available. My colleague who > implemented the first iteration did it that way and it should have > worked, but the syncprov-glue interactions in 2.3.x were such that that > resulted in sections of the tree not replicating and/or disappearing in > various ill-defined situations. The whole-tree approach may have some snags. 1) it's intended to work with a single syncprov over the entire glued tree 2) it requires a unique serverID for each master 3) it probably won't work to start from a completely empty DB - the root entry needs to exist on each server. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: Segfault in slapd 2.4.8 - syncrepl and glue (ITS#5430)
by duncan.gibb＠siriusit.co.uk 20 Mar '08

20 Mar '08

Howard Chu wrote: HYC> I modified your test case to make it easier to run. HYC> The new test is in ftp.openldap.org:incoming/its5430.tgz.1 Looks good; thank you very much for this. HYC> There were also some errors in your config file templates, HYC> which are corrected in the above archive. HYC> Most notably, in template 1, you explicitly configured HYC> syncprov on top of glue instead of taking the default HYC> behavior of syncprov under glue. As documented in HYC> slapd.conf(5), you should only do that when you want a HYC> single syncprov overlay to be master of the entire tree. HYC> In this case, you clearly want syncprov to only master HYC> one portion of the tree, with the subordinate being HYC> taken care of by its own syncprov overlay. That's a legacy of several previous iterations of the real project. Ideally each box would have at most two backends - one holding everything for which it's a consumer, and on the relevant machines a second backend holding the data for which it's master, then glue, then a single syncprov making the whole tree available. My colleague who implemented the first iteration did it that way and it should have worked, but the syncprov-glue interactions in 2.3.x were such that that resulted in sections of the tree not replicating and/or disappearing in various ill-defined situations. Replicating each subordinate separately and gluing them together per-server was a hack; the default overlay ordering gave us visible entries of objectclass glue (and consequent missing subtrees) on some consumers, so we reversed them. It's imperfect but less imperfect than it was before ;-) We will re-architect in the light of the recent changes to syncprov and glue. It has already been pointed out that big-picture discussion is noise on this list... HYC> Not an error, but you used unique rids across all of the HYC> config files. rids only need to be unique within a single server. I know. That's expediency in the config generator - it just increments a global counter each time it processes a consumer<-->provider relation. I'll make it more elegant next time round ;-) Cheers Duncan

1 0

Re: Segfault in slapd 2.4.8 - syncrepl and glue (ITS#5430)
by ghenry＠suretecsystems.com 20 Mar '08

20 Mar '08

<quote who="hyc(a)symas.com"> > Duncan Gibb wrote: >> hyc(a)symas.com wrote: >> HYC> syncrepl.c is now updated in HEAD to allow this replication to >> occur. >> >> HYC> Using your test, I don't see any crash. >> >> Thank you. We'll re-test with current HEAD and post the outcome. > > I modified your test case to make it easier to run. The new test is in > ftp.openldap.org:incoming/its5430.tgz.1 Above tests pass here with latest HEAD (repository revision: 1.390 /repo/OpenLDAP/pkg/ldap/servers/slapd/syncrepl.c,v > > You simply extract it inside the tests/ subdirectory of the build tree. > After > you have built the OpenLDAP source and run "make test" you can cd into the > its5430 directory and run the test.sh from there. > > There were also some errors in your config file templates, which are > corrected > in the above archive. > > Most notably, in template 1, you explicitly configured syncprov on top of > glue > instead of taking the default behavior of syncprov under glue. As > documented > in slapd.conf(5), you should only do that when you want a single syncprov > overlay to be master of the entire tree. In this case, you clearly want > syncprov to only master one portion of the tree, with the subordinate > being > taken care of by its own syncprov overlay. With your incorrect > configuration, > server3 would receive redundant updates because the subordinate branch > would > be propagated twice. server3 also doesn't need syncprov module loaded, as it's a pure consumer. Thanks, Gavin.

1 0

Re: (ITS#5432) syncrepl seg. faults if received a sync cookie with an empty csn= value
by rein＠basefarm.no 20 Mar '08

20 Mar '08

On Wed, 19 Mar 2008, Howard Chu wrote: >> syncprov_playlog() in syncprov.c generates such invalid cookies when >> replaying a >> sessionlog where no entries has been deleted. The patch also fixes this >> bug. > > This should never happen, since the cookie is not generated if there were no > deletes recorded. Well, it do happen anyhow. Using the 2.4.8 release this can easily be reproduced with a configuration where the syncrepl provider (with sessionlogs enabled) is on a glue backend, the consumer replicates the entire glue subtree. Assuming both servers are in sync and stopped, start the producer, modify an entry in the subordinate db and then start the consumer. The producer will send a cookie with an empty csn= value as in this sync debug output: slapd starting slap_queue_csn: queing 0x418001a0 20080320125116.487494Z#000000#000#000000 slap_graduate_commit_csn: removing 0x8d6ae0 20080320125116.487494Z#000000#000#000000 srs csn 20080320124421.642686Z#000000#000#000000 log csn 20080320125116.487494Z#000000#000#000000 syncprov_playlog: cookie=rid=007,csn= syncprov_search_response: cookie=rid=007,csn=20080320125116.487494Z#000000#000#000000 I have also tested it using the cvs head source, and the good news is that this was accidentally fixed (at least partly) by the ITS#5434 change. I.e it was caused by the be_search() in syncprov_playlog() not being propagated to the subordinate database and thereby not finding the entry that was modified. The bad news is that there is still a race condition that would have the same effect. If someone deletes the entry that was modified just before be_search() is done it will return the same empty cookie. I was able to reproduce this in a debugger by setting a break point before be_search(), deleting the entry with ldapdelete and in the debugger run next over the be_search() call before continuing. So, I still think it is best to keep delcsn[0].bv_val NULL until an actual cookie has been written to it, as slap_compose_sync_cookie() will create a cookie with an empty csn= if passed a csn array where bv_len is zero but bv_val is non-NULL. Although an alternative and maybe more correct fix could be to change slap_compose_sync_cookie() instead? Rein

1 0

Re: Segfault in slapd 2.4.8 - syncrepl and glue (ITS#5430)
by hyc＠symas.com 20 Mar '08

20 Mar '08

Duncan Gibb wrote: > hyc(a)symas.com wrote: > HYC> syncrepl.c is now updated in HEAD to allow this replication to occur. > > HYC> Using your test, I don't see any crash. > > Thank you. We'll re-test with current HEAD and post the outcome. I modified your test case to make it easier to run. The new test is in ftp.openldap.org:incoming/its5430.tgz.1 You simply extract it inside the tests/ subdirectory of the build tree. After you have built the OpenLDAP source and run "make test" you can cd into the its5430 directory and run the test.sh from there. There were also some errors in your config file templates, which are corrected in the above archive. Most notably, in template 1, you explicitly configured syncprov on top of glue instead of taking the default behavior of syncprov under glue. As documented in slapd.conf(5), you should only do that when you want a single syncprov overlay to be master of the entire tree. In this case, you clearly want syncprov to only master one portion of the tree, with the subordinate being taken care of by its own syncprov overlay. With your incorrect configuration, server3 would receive redundant updates because the subordinate branch would be propagated twice. You also inserted the overlay directives in the middle of your database clauses, which is poor practice (and caused crashes in older OpenLDAP releases). Again, as documented, overlays are meant to be configured at the end of their respective databases. Not an error, but you used unique rids across all of the config files. rids only need to be unique within a single server. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: Segfault in slapd 2.4.8 - syncrepl and glue (ITS#5430)
by Duncan.Gibb＠SiriusIT.co.uk 20 Mar '08

20 Mar '08

hyc(a)symas.com wrote: DG> We've scripted a test which works for me against 2.4.8 and did work DG> against CVS HEAD until Howard committed Apologies for the vagueness of last night's email. For a script whose aim is to demo a crash, "works" means "reproduces the crash with no more intervention than invoking it and pressing enter when prompted". I was trying to explain that in cases where replication doesn't occur, the test can't be done, but on re-reading in daylight perhaps I failed on that score ;-) There were two commits five minutes apart. I could reproduce the crash with syncrepl.c 1.387; replication 2-->1 did not occur with 1.389; I was just about to test with 1.388 when I read your second post. HYC> syncrepl.c is now updated in HEAD to allow this replication to occur. HYC> Using your test, I don't see any crash. Thank you. We'll re-test with current HEAD and post the outcome. Cheers Duncan

1 0

← Newer
1
2
3
4
5
6
7
8
...
17
Older →

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

openldap-bugs March 2008