contextCSN of subordinate syncrepl DBs
by Rein Tollevik
I've been trying to figure out why syncrepl used on a backend that is
subordinate to a glue database with the syncprov overlay should save the
contextCSN in the suffix of the glue database rather than the suffix of
the backend where syncrepl is used. But all I come up with are reasons
why this should not be the case. So, unless anyone can enlighten me as
to what I'm missing, I suggest that this be changed.
The problem with the current design is that it makes it impossible to
reliably replicate more than one subordinate db from the same remote
server, as there are now race conditions where one of the subordinate
backends could save an updated contextCSN value that is picked up by the
other before it has finished its synchronization. An example of a
configuration where more than one subordinate db replicated from the
same server might be necessary is the central master described in my
previous posting in
http://www.openldap.org/lists/openldap-devel/200806/msg00041.html
My idea as to how this race condition could be verified was to add
enough entries to one of the backends (while the consumer was stopped)
to make it possible to restart the consumer after the first backend had
saved the updated contextCSN but before the second has finished its
synchronization. But I was able to produce it by simply add or delete
of an entry in one of the backends before starting the consumer. Far to
often was the backend without any changes able to pick up and save the
updated contextCSN from the producer before syncrepl on the second
backend fetched its initial value. I.e it started with an updated
contextCSN and didn't receive the changes that had taken place on the
producer. If syncrepl stored the values in the suffix of their own
database then they wouldn't interfere with each other like this.
There is a similar problem in syncprov, as it must use the lowest
contextCSN value (with a given sid) saved by the syncrepl backends
configured within the subtree where syncprov is used. But to do that it
also needs to distinguish the contextCSN values of each syncrepl
backend, which it can't do when they all save them in the glue suffix.
This also implies that syncprov must ignore contextCSN updates from
syncrepl until all syncrepl backends has saved a value, and that
syncprov on the provider must send newCookie sync info messages when it
updates its contextCSN value when the changed entry isn't being
replicated to a consumer. I.e as outlined in the message referred to above.
Neither of these changes should interfere with ordinary multi-master
configurations where syncrepl and syncprov are both use on the same
(glue) database.
I'll volunteer to implement and test the necessary changes if this is
the right solution. But to know whether my analysis is correct or not I
need feedback. So, comments please?
--
Rein Tollevik
Basefarm AS
14 years
dITStructureRules/nameForms in subschema subentry for informational purpose
by Michael Ströder
HI!
Discussed this very briefly with Howard at LDAPcon 2007 based on an idea
of Steve:
Support for dITStructureRules and nameForms is still in OpenLDAP's TODO.
In the meanwhile slapd could accept definitions for both in slapd.conf
and simply pass them on to a schema-aware LDAP client for informational
purpose without enforcing them. Same function like rootDSE <file> in
slapd.conf.
Opinions?
Ciao, Michael.
--
Michael Ströder
E-Mail: michael(a)stroeder.com
http://www.stroeder.com
14 years, 8 months
RE24 testing
by Quanah Gibson-Mount
We're getting closer for a 2.4.14, although there's still some bugs
targeted for fixing first. One open issue in particular, ITS#5860 would be
useful to see if people could duplicate the current reported issue, where
on a large database with very small cache settings, slapd is slowing to a
crawl on multiple searches. See followups 41 to 43 in that ITS. So if you
have a large DB if you could help see if you can duplicate this and provide
feedback on settings, etc, it would be much appreciated.
The connection code is (hopefully) stable now.
Thanks,
Quanah
--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
14 years, 10 months
2.4.14?
by Michael Ströder
HI!
Now that the connection code changes have been reverted in HEAD and
therefore the lock issues have been solved:
What else holds back general testing of RE24 in preparation of 2.4.14
release?
Ciao, Michael.
14 years, 10 months
Re: RE24 connection code round 2
by ghenry@OpenLDAP.org
I'm getting failures on test039-glue-ldap-concurrency:
lt-slapd: bind.c:157: ldap_back_conn_delete: Assertion `!(*(&((lc))->lc_lcflags) & (((0x00000020U))))' failed.
Full testrun data:
wiki.suretec.org/testrun.tar.gz
Thanks.
--
Kind Regards,
Gavin Henry.
Managing Director.
T +44 (0) 1224 279484
M +44 (0) 7930 323266
F +44 (0) 1224 824887
E ghenry(a)suretecsystems.com
Open Source. Open Solutions(tm).
http://www.suretecsystems.com/
Suretec Systems is a limited company registered in Scotland. Registered
number: SC258005. Registered office: 13 Whiteley Well Place, Inverurie,
Aberdeenshire, AB51 4FP.
Subject to disclaimer at http://www.suretecgroup.com/disclaimer.html
14 years, 10 months
RE24 connection code round 2
by Quanah Gibson-Mount
The connection code has been reworked a bit more since the last call for
testing. Please test heavily with current RE24 CVS. Thanks!
--Quanah
--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
14 years, 10 months
RE24 connection code reworking
by Quanah Gibson-Mount
RE24 has had some code reworking in the connections area, due to some race
conditions that were being triggered. It would be useful if people could
test current RE24 to see if they encounter issues. There are still a
number of outstanding issues that need to be resolved before 2.4.14 will be
released, but I'd appreciate a head start on making sure that at least make
test is passing for folks with this new code in place. So far, it does for
both Howard and I, but Michael Ströder has reported failures in his
OpenSuSE builds that we can't reproduce.
Thanks,
Quanah
--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
14 years, 10 months
Re: RE24 connection code reworking
by Howard Chu
Pierangelo Masarati wrote:
> Howard Chu wrote:
>
>> What kind of system are you running on? Linux / multiprocessor?
>
> No, right now that's just my poor man's laptop (Intel(R) Pentium(R) M
> processor 2.26GHz, UP) running Linux (2.6.18, CentOS 5.2). Tomorrow I
> can check on better hardware, including SMP (quad Opteron), but I don't
> think that's a problem. Of course it's using epoll().
Can you try a newer Linux kernel?
The last useful log entries are:
do_syncrep2: rid=001 (-1) Can't contact LDAP server
do_syncrepl: rid=001 retrying (4 retries left)
connection_read(10): no connection!
The do_syncrep2 message is printed at syncrepl.c:1208.
At that point both err and rc are -1. It'll fall into line 1217 and tear down
the connection, which would remove it from epoll's control. So I don't
understand why epoll is waking up repeatedly after that point, unless perhaps
there's a kernel bug in epoll...
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
14 years, 10 months
Re: RE24 connection code reworking
by William Jojo
---- Original message ----
>Date: Sat, 24 Jan 2009 16:15:06 -0800
>From: Howard Chu <hyc(a)symas.com>
>Subject: Re: RE24 connection code reworking
>To: Pierangelo Masarati <ando(a)sys-net.it>
>Cc: Quanah Gibson-Mount <quanah(a)zimbra.com>,openldap-devel(a)openldap.org
>
>Pierangelo Masarati wrote:
>> Pierangelo Masarati wrote:
>>
>>> I ran 30 times test045 with HEAD and got no failures. Then re24 failed
>>> after 44 runs with the backtrace below (identical to the previous ones).
>
>I got thru 80 runs of test045 on HEAD (prior to my syncrepl patch) and then
>slapd hung on shutdown, with a deadlock between connections_shutdown(),
>connection_closing(), and send_ldap_ber(). So, still tinkering with this.
>
Ok, make test runs flawlessly on AIX 5.3 with GCC 4.2.3, BDB 4.6.21.3, Cyrus SASL 2.1.22.
How can I do successive runs of a specific test such as you've described here?
Wait a second, I pulled down OPENLDAP_REL_ENG_2_4 should I be grabbing HEAD for these tests?
Please try to overlook my ignorance, just want to help :-)
Cheers,
Bill
14 years, 10 months