Re: (ITS#5444) slapd seg. fault using mulitmirrored mode
by ando@sys-net.it
mohel(a)web.de wrote:
> The above log was generated using the test050 script just adding a line which
> enters information also to a second server. Perhaps this test would also make
> sense for further releases.
Can you please post the modification to the script?
p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
---------------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Email: pierangelo.masarati(a)sys-net.it
---------------------------------------
15 years, 1 month
(ITS#5445) syncprov double-free bugfix.
by rein@basefarm.no
Full_Name: Rein Tollevik
Version: CVS head
OS: linux and solaris
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (81.93.160.250)
The patch at the end adds missing parenthesis around a negated flags bit test in
syncprov.c. Without them the test always fails, the entry is never duplicated
and a double-free occur when the a_nvals is free'ed in the next statement if the
same entry is sent to more than one recipient simultaneously.
Rein Tollevik
Basefarm AS
Index: OpenLDAP/servers/slapd/overlays/syncprov.c
diff -u OpenLDAP/servers/slapd/overlays/syncprov.c:1.9
OpenLDAP/servers/slapd/overlays/syncprov.c:1.10
--- OpenLDAP/servers/slapd/overlays/syncprov.c:1.9 Sun Mar 23 14:06:03 2008
+++ OpenLDAP/servers/slapd/overlays/syncprov.c Mon Mar 31 15:43:31 2008
@@ -2385,7 +2385,7 @@
}
if ( !ap ) {
- if ( !rs->sr_flags & REP_ENTRY_MODIFIABLE ) {
+ if ( !(rs->sr_flags & REP_ENTRY_MODIFIABLE) ) {
rs->sr_entry = entry_dup( rs->sr_entry );
rs->sr_flags |=
REP_ENTRY_MODIFIABLE|REP_ENTRY_MUSTBEFREED;
15 years, 1 month
(ITS#5444) slapd seg. fault using mulitmirrored mode
by mohel@web.de
Full_Name: Markus
Version: 2.4.8
OS: SLES 10
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (212.185.43.218)
When using multi master replication und start the tree from scratch, I'm able to
ldapadd to one of the masters and all the stuff is replciated. But wenn
afterwards ldapadd a new entry to one of the other servers the server crashes
with the segmentation error.
The following log outout is generated:
----------------------------------------------
conn=11 op=1 SEARCH RESULT tag=101 err=0 nentries=20 text=
connection_get(19)
connection_get(19): got connid=12
connection_read(19): checking for input on id=12
ber_get_next
ber_get_next: tag 0x30 len 162 contents:
ber_get_next
conn=12 op=1 do_search
ber_scanf fmt ({miiiib) ber:
>>> dnPrettyNormal: <dc=example,dc=com>
=> ldap_bv2dn(dc=example,dc=com,0)
<= ldap_bv2dn(dc=example,dc=com)=0
=> ldap_dn2bv(272)
<= ldap_dn2bv(dc=example,dc=com)=0
=> ldap_dn2bv(272)
<= ldap_dn2bv(dc=example,dc=com)=0
<<< dnPrettyNormal: <dc=example,dc=com>, <dc=example,dc=com>
SRCH "dc=example,dc=com" 2 0 0 0 0
ber_scanf fmt (m) ber:
filter: (objectClass=*)
ber_scanf fmt ({M}}) ber:
=> get_ctrls
ber_scanf fmt ({m) ber:
ber_scanf fmt (m) ber:
=> get_ctrls: oid="1.3.6.1.4.1.4203.1.9.1.1" (noncritical)
ber_scanf fmt ({i) ber:
ber_scanf fmt (m) ber:
ber_scanf fmt (b) ber:
ber_scanf fmt (}) ber:
<= get_ctrls: n=1 rc=0 err=""
attrs: * +
conn=12 op=1 SRCH base="dc=example,dc=com" scope=2 deref=0
filter="(objectClass=*)"
conn=12 op=1 SRCH attr=* +
=> bdb_search
bdb_dn2entry("dc=example,dc=com")
search_candidates: base="dc=example,dc=com" (0x00000001) scope=2
=> bdb_dn2idl("dc=example,dc=com")
=> bdb_equality_candidates (entryCSN)
<= bdb_equality_candidates: (entryCSN) not indexed
bdb_search_candidates: id=-1 first=1 last=20
bdb_search: 1 does not match filter
bdb_search: 2 does not match filter
bdb_search: 3 does not match filter
bdb_search: 4 does not match filter
bdb_search: 5 does not match filter
bdb_search: 6 does not match filter
bdb_search: 7 does not match filter
bdb_search: 8 does not match filter
bdb_search: 9 does not match filter
bdb_search: 10 does not match filter
bdb_search: 11 does not match filter
bdb_search: 12 does not match filter
bdb_search: 13 does not match filter
bdb_search: 14 does not match filter
bdb_search: 15 does not match filter
bdb_search: 16 does not match filter
bdb_search: 17 does not match filter
bdb_search: 18 does not match filter
bdb_search: 20 does not match filter
send_ldap_result: conn=12 op=1 p=3
send_ldap_result: err=0 matched="" text=""
=> bdb_search
bdb_dn2entry("dc=example,dc=com")
search_candidates: base="dc=example,dc=com" (0x00000001) scope=2
=> bdb_dn2idl("dc=example,dc=com")
=> bdb_presence_candidates (objectClass)
bdb_search_candidates: id=-1 first=1 last=20
send_ldap_result: conn=12 op=1 p=3
send_ldap_result: err=0 matched="" text=""
send_ldap_intermediate: err=0 oid=1.3.6.1.4.1.4203.1.9.1.4 len=368
send_ldap_response: msgid=2 tag=121 err=0
ber_flush2: 409 bytes to sd 19
send_ldap_result: conn=12 op=1 p=3
send_ldap_result: err=0 matched="" text=""
slap_sl_malloc of 136867984 bytes failed, using ch_malloc
----------------------------------------------
It seems as the malloc requets of almost 130MB is simple to much.
The above log was generated using the test050 script just adding a line which
enters information also to a second server. Perhaps this test would also make
sense for further releases.
15 years, 1 month
Re: (ITS#5442) slapd_rq not locked before use bugfix
by hyc@symas.com
rein(a)tollevik.no wrote:
> On Sat, 29 Mar 2008, ando(a)sys-net.it wrote:
>> rein(a)basefarm.no wrote:
>>> I was seeing random failures of the test050-syncrepl-multimaster test. One of
>>> the failures was that it went into a tight loop traversing a circular runqueue
>>> it had managed to create in slapd_rq.task_list. It seems as this was caused by
>>> missing mutex locks around accesses to slapd_rq, which the patch uploaded to
>>> ftp://ftp.openldap.org/incoming/slapd_rq_lock.patch fixes.
>>>
>>> Before I applied this patch the test failed after being run a few times, with it
>>> it has now passed 100 times and is still counting.
>> locks in back-bdb/config.c should be pointless, as modifications to the
>> configuration should only occur while all threads are paused. The rest
>> makes sort of sense, but I'd leave it to Howard.
Ignoring the ITS#5403 changes, I don't see anything here that isn't
config-related, therefore it's all running single-threaded.
Of the "relevant" changes in syncrepl.c, I note that three out of the four
chunks of the patch are in code that is only run when using cn=config to
delete a syncrepl configuration, and test050 never performs that operation.
The remaining chunk only takes affect when adding syncrepl config, and again,
slapd is single-threaded for that.
I've also run test050 thru hundreds of iterations without any issue, without
these patches. If there's a problem in test050, I don't believe it's in this code.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
15 years, 2 months
Re: (ITS#5442) slapd_rq not locked before use bugfix
by rein@tollevik.no
On Sat, 29 Mar 2008, ando(a)sys-net.it wrote:
> rein(a)basefarm.no wrote:
>
>> I was seeing random failures of the test050-syncrepl-multimaster test. One of
>> the failures was that it went into a tight loop traversing a circular runqueue
>> it had managed to create in slapd_rq.task_list. It seems as this was caused by
>> missing mutex locks around accesses to slapd_rq, which the patch uploaded to
>> ftp://ftp.openldap.org/incoming/slapd_rq_lock.patch fixes.
>>
>> Before I applied this patch the test failed after being run a few times, with it
>> it has now passed 100 times and is still counting.
>
> locks in back-bdb/config.c should be pointless, as modifications to the
> configuration should only occur while all threads are paused. The rest
> makes sort of sense, but I'd leave it to Howard.
That is probably true, it looks as if the places in config.c where locks
really are required already had them. My patch adds locks everywhere
slapd_rq was used without them, as I don't have enough knowledge of the
code to know which functions guarranteed to only be used when threads
are not running.
I believe that the important patch is to syncrepl.c, but I found I it best
to add locks everywhere just to be on the safe side.
Looking at my copy of the patch it appears that another syncrepl.c patch
which I was sure I had edited out has slipped through anyhow :-(. It is
the first two modifications related to ldap_get_option, please disregard
them in this bugreport. They have been reported in ITS#5403.
Rein
15 years, 2 months
Re: (ITS#5442) slapd_rq not locked before use bugfix
by ando@sys-net.it
rein(a)basefarm.no wrote:
> I was seeing random failures of the test050-syncrepl-multimaster test. One of
> the failures was that it went into a tight loop traversing a circular runqueue
> it had managed to create in slapd_rq.task_list. It seems as this was caused by
> missing mutex locks around accesses to slapd_rq, which the patch uploaded to
> ftp://ftp.openldap.org/incoming/slapd_rq_lock.patch fixes.
>
> Before I applied this patch the test failed after being run a few times, with it
> it has now passed 100 times and is still counting.
locks in back-bdb/config.c should be pointless, as modifications to the
configuration should only occur while all threads are paused. The rest
makes sort of sense, but I'd leave it to Howard.
p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
---------------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Email: pierangelo.masarati(a)sys-net.it
---------------------------------------
15 years, 2 months
Re: (ITS#5340) REP_ENTRY_MODIFIABLE bug in dynlist
by ando@sys-net.it
Hallvard B Furuseth wrote:
> ando(a)sys-net.it wrote:
>
> [about REP_ENTRY_MUSTRELEASE]
>> it is not
>> clear what happens when a callback chain is interrupted by
>> slap_null_cb() or similar, without getting to slap_send_search_entry().
>> This seems to indicate that callbacks should always provide a last
>> resort means to release the resources they set; if read-only, by keeping
>> track of what they sent; if modifiable, by freeing the contents of
>> rs->sr_* if not NULL, setting it to NULL to prevent further cleanup.
>
> That sounds cumbersome, I hope slapd could take care of that somehow.
Yes, it could: for example, by providing a helper like
slap_entry2modifiable()
that takes s SlapReply, checks the flags and does nothing if the entry
is already modifiable, or copies it if not, releasing the original one
if REP_ENTRY_MUSTRELEASE was set.
> But I don't see how the be_release() code can work now. It sounds like
> be->be_release() functions must check (how?) that the entry was created
> by 'be', and otherwise pass it on to the next overlay/backend or
> otherwise to entry_free(). Might involve mucking with op->o_bd and
> sr_entry->e_private, I suppose. Except maybe I'm missing some existing
> magic since slapd doesn't regularly crash...
Yes, but that's trivial: e_private must be NULL for temporary entries,
and copying the entry loses it (no one is supposed to muck with it
expect the entry's creator). And the appropriate o_bd of a
(non-modifiable) entry can be easily computed from the entry's DN.
> be->be_release() does receive entries that were not created by 'be' or
> at least not with be->be_fetch(), see openldap-devel thread 'slapd API'
> in mar 2008.
Might be a bug, but I'm not familiar with that code.
>> Similarly, the existence of REP_ENTRY_MUSTBEFREED is not totally clear:
>> in principle as soon as REP_ENTRY_MODIFYABLE is set, it should imply
>> REP_ENTRY_MUSTBEFREED; the only difference in the semantics of the two
>> is that REP_ENTRY_MODIFYABLE without REP_ENTRY_MUSTBEFREED implies that
>> the callback that set the former will take care of freeing the entry;
>> however, other callbacks may further modify it, so freeing temporary
>> data should probably be left to the final handler.
>
> That's not my impression. MODIFIABLE would be that other modules than
> the creator can modify the entry - but the creator might still be the
> one who will free it. MUSTBEFREED is that the entry must be
> entry_free()ed - the creator will not do it (or not do it unless that
> flag is set).
They clearly mean different things; my point is that as soon as an entry
is modifiable it is not read-only and thus is a temporary, and thus will
eventually need to be freed. So the point is who is actually going to
free it. A copy might be created by an overlay after receiving a
read-only entry, but the same overlay might not actually perform the
copy if it receives the entry from another overlay that already copied
it, or from a proxy backend. However, after the entry is copied the
overlay will have no means to determined who actually created the copy.
This might be an issue depending on the order cleanup handlers are
called (didn't check what order they're called). My point is that
temporary entries need to be freed at some point; who frees them should
not be relevant...
> So, if I'm getting this right...
>
> A backend must expect an entry to change or be freed if it sends the
> entry with REP_ENTRY_<MUSTRELEASE, MUSTBEFREED or MODIFIABLE>, or if
> it passes through a slap_callback.sc_response.
Right now, a backend expects the entry to change if sent with
MODIFIABLE; to be released if sent with MUSTBERELEASED; to be freed if
sent with MUSTBEFREED. Modifications could occur to a MODIFIABLE entry
when passing through a sc_callback(). Otherwise, if the entry is not
MODIFIABLE, a copy will be created if the callback needs to modify the
entry; after copying, the entry will be released if MUSTBERELEASED was
set. When a copy is created, the callback that creates the copy must
also clear the MUSTBERELEASED flag. Optionally, it may set the
MUSTBEFREED flag if it doesn't intend to take care of cleaning the entry
up after it's done.
> back-ldif does not: it uses e_name/e_nname _after_ sending with with
> REP_ENTRY_MODIFIABLE.
That should be considered a bug.
> Nor overlay retcode, it calls retcode_entry_response(,,,rs->sr_entry)
> which makes pointers into sr_entry without checking if those flags are
> set. If I'm getting that code correctly. I haven't tested.
I havent' checked, but that might be a bug as well.
> Others apparent problems (also not tested, I've just browsed the code):
>
> Overlays that obey and reset MUSTBEFREED or MUSTRELEASE, do not
> necessarily clear or set MODIFIABLE when setting a new entry.
> translucent does not, even though it is a careful one which
> obeys both MUSTBEFREED and MUSTRELEASE.
>
>
> I'm not sure when the code must clear the entry_related flags.
> Some code which sets new entries seem to assume they are zero,
> but some code which sets sr_entry=NULL does not clear the flags.
>
> There are sr_flags bits which are not about the entry, in particular
> REP_MATCHED_MUSTBEFREED and REP_MATCHED_MUSTBEFREED. Looks like these
> flags can get lost when but some code sets sr_flags = <something>
> rather sr_flags |= <something> or sr_flags ~= ~<something>.
It seems to me that we should provide "smart" handlers to deal with
preparing sr_entry for modification, and to take care of cleaning things
up as appropriate. Those helpers should then be consistently used in
the code.
p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
---------------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Email: pierangelo.masarati(a)sys-net.it
---------------------------------------
15 years, 2 months
Re: (ITS#5443) Multiple identical attibutes break syncrepl process fatally?
by ando@sys-net.it
marian.eichholz(a)freenet.ag wrote:
> We use openldap as mail service directory with some 8 Mio objects on several
> replicas.
> For openldap 2.4.x we have to migrate from slurpd to syncrepl.
> We got a working syncrepl provider als slurpd consumer (slapadd -q, 36 hrs)
>
> So I try to get a blank DB up by syncrepl only (yes, it is not at all
> performant, but informative)
>
> The process kind of breaks after a couple of minutes and some 44.000 objects
> (8.000.000 expected). Tracing it on the consumer side (-d 16384), I see
> something like this after an entry:
>
> syncrepl_message_to_entry: rid=001 mods check (forwardto: value #4 provided more
> than once)
>
> Indeed, the entry to come has three "forwardto:" Attributes with the same value
> (and other forwardto-attributes, too). This makes no greate sense at the
> application level, but until now it has been perfectly OK for the directory, and
> the LDAP-API did not complain about the attribute modification, neither did the
> slurpd.
It is a violation of RFC 4512, section 2.2, which OpenLDAP 2.4 conforms to.
> This leads to some questions and suggestions:
>
> - the provider does not log anything with -d 16384, no error, no nothing. Could
> it do some useful logging about successful and failing replication sessions?
What's -16384? Since OpenLDAP 2.3 you can use strings to identify each
log subsystem (16384 == 0x4000 == "sync").
The error occurs when the consumer tries to manipulate the data it
receives. The producer has nothing to do with it, since it assumes that
data contained in it already passed sanity checks when they were stored.
How incorrect data got stored into the producer is a totally different
business, and the producer-side replication process should not muck with it.
> - the consumer does not log anything that can explain, why the remaining objects
> are not read, either. A bit of warning/logging could help the hopeful admin,
> probably.
A sync error occurred, which prevented sync'in from continuing. This
error is logged by the "sync" subsystem. As far as I understand from
reading the code, the error (at least, a replication error) should be
logged also by the "any" subsystem, which means that as soon as any
logging is enabled, you get a message logged.
> - why is one problematic object lethal for the whole rest of the objects, since
> future modifications keep to be incorporated? Is this lack of robustness more a
> bug or a feature?
If inconsistent data is received, synchronization is supposed to stop.
In fact, continuing may result in an inconsistent state. The fact that
the stop is caused by a real error, and the fact that fixing the error
allows synchronization to recover doesn't sound like lack of robustness
to me. It sounds more about wisdom.
> - are identical attributes really forbidden with LDAP?
RFC 4512, Section 2.2
> - what could one do, to prevent unskillful "editors" of the master node to kill
> the replication processes for the whole replication cluster? Besides adding a
> checking/filtering API layer, of course.
Slapd has sanity checks for this. Slapadd doesn't, since it is supposed
to be operated only with consistent data, as resulting from slapcat.
You might have slapadd'ed inconsistent data to the producer.
In the end, I don't see how this ITS involves a bug in synchronization
software. The fact your producer got corrupted by inconsistent data
might have been caused by a bug in the software, however your analysis
does not give a clear indication of how it happened. If it happened by
slapadd, then it's a known (and desired, and documented) limitation of
the software. Unless you can reproduce it, I'd consider this ITS closed.
p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
---------------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Email: pierangelo.masarati(a)sys-net.it
---------------------------------------
15 years, 2 months
(ITS#5443) Multiple identical attibutes break syncrepl process fatally?
by marian.eichholz@freenet.ag
Full_Name: Marian Eichholz
Version: 2.4.8
OS: Linux
URL:
Submission from: (NULL) (194.97.7.65)
We use openldap as mail service directory with some 8 Mio objects on several
replicas.
For openldap 2.4.x we have to migrate from slurpd to syncrepl.
We got a working syncrepl provider als slurpd consumer (slapadd -q, 36 hrs)
So I try to get a blank DB up by syncrepl only (yes, it is not at all
performant, but informative)
The process kind of breaks after a couple of minutes and some 44.000 objects
(8.000.000 expected). Tracing it on the consumer side (-d 16384), I see
something like this after an entry:
syncrepl_message_to_entry: rid=001 mods check (forwardto: value #4 provided more
than once)
Indeed, the entry to come has three "forwardto:" Attributes with the same value
(and other forwardto-attributes, too). This makes no greate sense at the
application level, but until now it has been perfectly OK for the directory, and
the LDAP-API did not complain about the attribute modification, neither did the
slurpd.
This leads to some questions and suggestions:
- the provider does not log anything with -d 16384, no error, no nothing. Could
it do some useful logging about successful and failing replication sessions?
- the consumer does not log anything that can explain, why the remaining objects
are not read, either. A bit of warning/logging could help the hopeful admin,
probably.
- why is one problematic object lethal for the whole rest of the objects, since
future modifications keep to be incorporated? Is this lack of robustness more a
bug or a feature?
- are identical attributes really forbidden with LDAP?
- what could one do, to prevent unskillful "editors" of the master node to kill
the replication processes for the whole replication cluster? Besides adding a
checking/filtering API layer, of course.
Thank You in advance. Please let me know, if I can provide You with something
useful information about our issue(s).
15 years, 2 months