Re: (ITS#4860) Sets' enhancement
by jclarke@linagora.com
This is a multi-part message in MIME format.
--------------080102080503050601010703
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
jclarke(a)linagora.com a écrit :
> Pierangelo Masarati a écrit :
>> Should be fixed now in HEAD/re24/re23. Please test. p.
>
> I've been testing (at last, sorry for the delay), and I've come across
> another memory problem. Backtrace is below, and valgrind output is attached.
Got this one: it was a double-free in sets.c occuring after a
slap_set_join() with lset or rset empty - the non empty set was
returned, and then freed, causing a double-free error or segfault.
The patch attached corrects this problem on RE23 and HEAD for me and
doesn't have any side effects on our test set. However, it may not be
the "right" way - please correct if necessary!
Your recent fixes have solved all the issues from our test cases we were
encountering. Thank you very much for them.
Jon
--
Jonathan Clarke
Cellule OSSA - Groupe LINAGORA
27 rue de Berri, 75008 Paris
Tél: 01 58 18 68 28, fax: 01 58 18 68 29
http://www.linagora.com - http://www.08000linux.com
--------------080102080503050601010703
Content-Type: text/x-patch;
name="jonathan-clarke-071008.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="jonathan-clarke-071008.patch"
--- servers/slapd/sets.c.orig 2007-10-08 18:20:08.000000000 +0200
+++ servers/slapd/sets.c 2007-10-08 18:22:29.000000000 +0200
@@ -261,11 +261,15 @@
} else {
set = set_dup( cp, lset, SLAP_SET_LREF2REF( op_flags ) );
+ /* set array reference has been copied - don't free */
+ op_flags |= SLAP_SET_LREFVAL | SLAP_SET_LREFARR;
break;
}
} else if ( j == 0 ) {
set = set_dup( cp, rset, SLAP_SET_RREF2REF( op_flags ) );
+ /* set array reference has been copied - don't free */
+ op_flags |= SLAP_SET_RREFVAL | SLAP_SET_RREFARR;
break;
}
--------------080102080503050601010703--
14 years, 9 months
Re: (ITS#4940) libldap doesn't wait for server's TLS close_notify
by guenther+ldapdev@sendmail.com
On Mon, 8 Oct 2007, Howard Chu wrote:
...
>> There are a number of ways this can be handled:
>> 1) change the client to wait until it sees the server's close_notify alert
>> by
>> replacing "SSL_shutdown( p->ssl );" in tls.c with the two lines:
>> if (SSL_shutdown( p->ssl ) == 0)
>> SSL_shutdown( p->ssl );
>> (I have confirmed that this works. As documented, the first call
>> will return 1 if the server's close_notify has already been
>> received, if not, the second call will block until it is received.)
>
> So if the server doesn't send one, the client will be stuck waiting forever?
It would also unblock if the server closed the connection. One downside
of this option that I thought of later is that it shifts the TCP
CLOSE_WAIT state from the client to the server. Fixing that would add
more complexity to the sockbuf layer than this entire change is worth.
Having chatted with Kurt about this at the last IETF meeting and pondered
failure modes, I'm no longer in favor of this option.
>> 2) change the client to not bother to send a close_notify alert when
>> it's just going to close() the connection; change the server to not
>> send a close_notify if it didn't get one. <...>
>
> Sounds like a change in the SSL library, not something for us to worry
> about.
Since "send a close_notify alert" == "call SSL_shutdown() for the first
time", it would be a change in how the SSL library was used by libldap.
Anyway, this issue isn't worth keep an ITS open about, as it doesn't
actually cause failures or visible errors. I might someday chase down a
clean way of implementing this second option, but only after the much more
useful work of coming up with a reasonable API to let event-driven apps do
STARTTLS without blocking. Someday.
Philip Guenther
14 years, 9 months
Re: ITS#5174 openldap.schema entry not valid per RFC 4512
by ando@sys-net.it
> The reason I reported it was that I wrote a parser for LDAP schema
> based directly on the formal grammar, downloaded all the schema from
> OpenLDAP as a test, and that was the only schema that broke, and only
> on the one element. I did not have any trouble with the OpenLDAP
> software accepting the schema. If you use a grammar builder (I used
> ANTLR) rather than a hand-coded parser, being permissive makes it a
> bit more complex. Allowing the elements to appear in any order would
> be easy to parse for the RFC 4512 elements, because they happen to be
> defined with markers ("NAME", "DESC" and so on) that makes them easy
> to differentiate, but then enforcing at-most-once semantics on the
> elements requires either an additional pass or some hand coded
> predicates to check for duplicates in a data structure being built on-
> the-fly.
Thanks for the feedback. It's now fixed in HEAD/re24.
p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
---------------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Email: pierangelo.masarati(a)sys-net.it
---------------------------------------
14 years, 9 months
Re: ITS#5174 openldap.schema entry not valid per RFC 4512
by bhanafee@gmail.com
p.,
Thanks for looking at it. I think the RFC 4234 parse of the elements
in the RFC 4512 definition would be as a 'concatenation' (based on
rule->elements->alternation->concatenation). To get "any order"
behavior under RFC 4234, I think the elements in the definition would
be separated by '/' characters, and all the elements would be enclosed
in a group with a repetition element. There's an example in the
"Generalized Time" definition (RFC 4517, section 3.3.13) that shows
why the optional members must be in sequence by default. In that
particular case, the [fraction] can't be allowed to appear before the
[minute ...] part, or it would be ambiguous whether the final "3015"
in "2007100807.53015" was part of the fraction or a minute and second
that appears (for no good reason) after the fraction ".5"
The reason I reported it was that I wrote a parser for LDAP schema
based directly on the formal grammar downloaded all the schema from
OpenLDAP as a test. That was the only schema that broke, and only on
the one element. I did not have any trouble with the OpenLDAP
software accepting the schema. If you use a grammar builder (I used
ANTLR) rather than a hand-coded parser, being permissive makes it a
bit more complex. Allowing the elements to appear in any order would
be easy to parse for the RFC 4512 elements, because they happen to be
defined with markers ("NAME", "DESC" and so on) that makes them easy
to differentiate, but then enforcing at-most-once semantics on the
elements requires either an additional pass or some hand coded
predicates to check for duplicates in a data structure being built
on-the-fly.
-- Brian
On 10/8/07, Pierangelo Masarati <ando(a)sys-net.it> wrote:
> Not sure if ordering of optional sequence members is required by RFC
> 4234, but the change you suggest sounds harmless. OpenLDAP software, in
> this sense, is usually permissive in what is accepted and strict in what
> is emitted.
>
> Thanks, p.
>
>
>
> Ing. Pierangelo Masarati
> OpenLDAP Core Team
>
> SysNet s.r.l.
> via Dossi, 8 - 27100 Pavia - ITALIA
> http://www.sys-net.it
> ---------------------------------------
> Office: +39 02 23998309
> Mobile: +39 333 4963172
> Email: pierangelo.masarati(a)sys-net.it
> ---------------------------------------
>
>
>
14 years, 9 months
Re: (ITS#4940) libldap doesn't wait for server's TLS close_notify
by hyc@symas.com
guenther+ldapdev(a)sendmail.com wrote:
> Full_Name: Philip Guenther
> Version: 2.3.27
> OS: Linux and Solaris
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (64.58.1.252)
>
>
> [I vaguely recall seeing a report of this issue in the archives of one of the
> mailing lists, but I can no longer find the original.]
>
> If you trace the packets sent when you use, for example, ldapsearch against a
> server on a different host, using either the -Z option to do TLS or using an
> ldaps URI, you'll discover that the TCP connection is actually reset instead of
> being closed cleanly: the client sends TCP RSTs in response to the server's
> final packets.
>
> This is because libldap uses the following sequence when unbind a TLS or SSL
> connection:
> 1) send the unbind request (over the TLS or SSL layer)
> 2) call SSL_shutdown(), sending the TLS close_notify alert
> 3) call close()
>
> After receiving the close_notify alert from step (2), the server sends back its
> own close_notify alert and then calls close(). However, because the client
> didn't wait for the server's response before calling close() on its end, the
> client's TCP stack considers the TCP connection to already be gone and responds
> with the RST packets. This occurs with Linux and Solaris clients and probably
> most other unices: the response to packets after a close() doesn't vary in my
> experience.
>
> There are a number of ways this can be handled:
> 1) change the client to wait until it sees the server's close_notify alert by
> replacing "SSL_shutdown( p->ssl );" in tls.c with the two lines:
> if (SSL_shutdown( p->ssl ) == 0)
> SSL_shutdown( p->ssl );
> (I have confirmed that this works. As documented, the first call will return
> 1
> if the server's close_notify has already been received, if not, the second
> call
> will block until it is received.)
So if the server doesn't send one, the client will be stuck waiting forever?
> 2) change the client to not bother to send a close_notify alert when it's just
> going to close() the connection; change the server to not send a
> close_notify
> if it didn't get one. This probably violates the TLS spec, but the fact
> that
> TLS/1.1 permits resumption of sessions without close_notify having been sent
> indicates that the violation is not a major issue, particularly given that
> LDAP's
> unbind request prevents truncation attacks. Close_notifies are, of course,
> required if the client just wants to terminate the TLS layer and resume
> unprotected LDAP operations.
Sounds like a change in the SSL library, not something for us to worry about.
>
> 3) ignore the issue: it only causes one or two extra packets to be sent. While
> it
> also eliminates the TIME_WAIT state, LDAP's application-level close (the
> unbind
> request) means it doesn't need reliable full-duplex closure, so the only
> concern
> would be random connection issues from reincarnations of the TCP tuple,
> which
> is unlikely for an LDAP connection.
> Personally, I like the simplicity and cleanliness of solution (1).
(1) has the possibility of an indefinite hang. As such, I think it best to
leave it with the current behavior.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
14 years, 9 months
Re: (ITS#5171) hdb txn_checkpoint failures
by richton@nbcs.rutgers.edu
> One more thing to check is just using "ls -l" to see if the actual size of
> the log files corresponds with the db_stat offsets. E.g. if slave6 base1's
> log.0000001 is really 8MB but the LSN is only 233KB, then we have to look for
> a weird in-memory corruption. If not, then somebody reset your logs.
No, it looks like those sizes all match. Actually, the "reset logs" may
well be the case (although I still can't imagine how, I'm willing to just
chalk this whole thing up to user error...of course logs show that the
user was me, which is a shame :) and is hard to disprove (with only one
log file active) with the exception of base2. base2 has multiple log files
going back:
[slave4]
-rw------- 1 root root 9999986 Sep 6 18:03 log.0000000001
-rw------- 1 root root 9999967 Sep 10 14:03 log.0000000002
-rw------- 1 root root 9999983 Sep 18 16:33 log.0000000003
-rw------- 1 root root 9429761 Oct 8 05:33 log.0000000004
[slave6]
-rw------- 1 root root 9999986 Sep 6 18:03 log.0000000001
-rw------- 1 root root 9999967 Sep 10 14:03 log.0000000002
-rw------- 1 root root 9999983 Sep 18 16:33 log.0000000003
-rw------- 1 root root 9429761 Oct 8 05:33 log.0000000004
which of course match the db_stat -l, but also extend back prior to
September 24 according to the filesystem timestamps. I guess the argument
could be made that log 4 was truncated on September 24...would that be
detected/come up sane/come up bad in the db_stat?
14 years, 9 months
Sync replication failure during startup.
by Stelios Grigoriadis
OpenLDAP v. 2.3.32
Berkeley DB 4.6
gcc 4.1.0
Replication doesn't work if the master server is started after
the replica servers and a large amount of simoultaneous updates
are performed while the server is starting up.
The entries that didn't get replicated to the replicas will not
be replicated even after a restart of both master and replicas.
The contextCSN is set to a value larger than the entryCSN of the
"lost" entries.
This is what I think happens during a master server startup with
simoultaneous updates ongoing (and replicas trying to sync in the
initial phase).
Suppose that two clients (Client1 and Client2) are adding the entries
a and b respectively. If that happens between t1 and t2 (one second
between)
they will get the same entryCSN (same timestamp). If entry a is
committed
at tc1 and b at tc2, any replica search inbetween will only get the
entry a. The entry b will be lost.
Client1 entry=a, csn=x
Client2 entry=b, csn=x
Timeline ------+----------+---------+----+------>
| |
t1 | | t2=t1+1
| |
tc1=entry a tc2=entry b
committed committed
Replica search query between tc1 and tc2.
I don't know if a higher granularity would prevent this, or even better,
to have some kind of a counter so that every modification gets a unique
csn.
Can you please comment on our analyzis to let us know if the analyzis is
correct or if we have missed something important?
Any help or hints on how to avoid or fix this problem is greatly
appreciated.
If I receive useful information direcly in private email, I will post a
summary.
Regards
Stelios Grigoriadis
14 years, 9 months
Re: (ITS#5171) hdb txn_checkpoint failures
by hyc@symas.com
Aaron Richton wrote:
>> It's still rather suspicious that slave4 and slave6 both had identical log
>> status for base1 (1/188113) but different requested locations (1/8730339 vs
>> 1/8730401). If they're identically configured slaves then they ought to be in
>> lock-step. Then again, obviously they're not identical since slave6 doesn't
>> show base4 in your log.
>
> Identical is relative. They've got the same OpenLDAP and supporting
> binaries running on the same patches of Solaris 9 running identical
> turn-up scripts with identical configuration files. But this is
> production, so we've got data changes over time. For instance, the slaves
> bootstrap with a slapadd -q, and the underlying slapcat could easily be
> different from slave4 vs. slave6 (the most recent one is automatically
> used). I'd imagine this would look different at the db layer, even once
> syncrepl eventually converged the logical data?
>
>> Do you have the db_stat output from an uncorrupted slave? What about the
>> master?
>
> Sure... https://www.nbcs.rutgers.edu/~richton/its5171_dbstatl2
Judging from the LSNs in use on these other servers, it sure looks like
somebody went in and zeroed out your logs on slave4 and slave6. I don't think
the environment spontaneously corrupted itself and reset the log offsets...
One more thing to check is just using "ls -l" to see if the actual size of the
log files corresponds with the db_stat offsets. E.g. if slave6 base1's
log.0000001 is really 8MB but the LSN is only 233KB, then we have to look for
a weird in-memory corruption. If not, then somebody reset your logs.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
14 years, 9 months