Rein Tollevik wrote:
On Sun, 30 Mar 2008, hyc(a)symas.com wrote:
> rein(a)tollevik.no wrote:
>> On Sat, 29 Mar 2008, ando(a)sys-net.it wrote:
>>> rein(a)basefarm.no wrote:
>>>> I was seeing random failures of the test050-syncrepl-multimaster test.
One of
>>>> the failures was that it went into a tight loop traversing a circular
runqueue
>>>> it had managed to create in slapd_rq.task_list. It seems as this was
caused by
>>>> missing mutex locks around accesses to slapd_rq, which the patch uploaded
to
>>>>
ftp://ftp.openldap.org/incoming/slapd_rq_lock.patch fixes.
>>>>
>>>> Before I applied this patch the test failed after being run a few times,
with it
>>>> it has now passed 100 times and is still counting.
>>> locks in back-bdb/config.c should be pointless, as modifications to the
>>> configuration should only occur while all threads are paused. The rest
>>> makes sort of sense, but I'd leave it to Howard.
> Ignoring the ITS#5403 changes, I don't see anything here that isn't
> config-related, therefore it's all running single-threaded.
Now that the configuration can be changed dynamically (as this test does)
I find it a bit odd that the config stuff should always be running
single-threaded. But there is obviously much I don't know about the
internals of slapd.
This was discussed at the 2003 OpenLDAP Developers' Day.
http://www.openldap.org/conf/odd-wien-2003/proceedings.html
Page 7 of my slides touched on it.
It's also mentioned again here
http://www.openldap.org/lists/openldap-devel/200505/msg00062.html
The main point is that the original config stuff assumed that it was only
executing during slapd startup, and single-threaded. Putting locks around all
of the potential configurable value accesses would have been too much work, so
the decision was made to force slapd to be single-threaded whenever writing to
the config.
I'm guessing that what's happening here is just a broken assumption -
suspending the thread pool doesn't in fact freeze everything in slapd. In
particular, the slapd listener thread may still wake up for a select() timeout
to handle the slapd runqueue. Even though no tasks submitted as a result of
this will execute (because they'll just get queued into the suspended thread
pool) the runqueue itself is still being manipulated.
Given that explanation, your patches make sense.
--
-- Howard Chu
Chief Architect, Symas Corp.
http://www.symas.com
Director, Highland Sun
http://highlandsun.com/hyc/
Chief Architect, OpenLDAP
http://www.openldap.org/project/