Rein Tollevik wrote:
On Sun, 30 Mar 2008, hyc@symas.com wrote:
rein@tollevik.no wrote:
On Sat, 29 Mar 2008, ando@sys-net.it wrote:
rein@basefarm.no wrote:
I was seeing random failures of the test050-syncrepl-multimaster test. One of the failures was that it went into a tight loop traversing a circular runqueue it had managed to create in slapd_rq.task_list. It seems as this was caused by missing mutex locks around accesses to slapd_rq, which the patch uploaded to ftp://ftp.openldap.org/incoming/slapd_rq_lock.patch fixes.
Before I applied this patch the test failed after being run a few times, with it it has now passed 100 times and is still counting.
locks in back-bdb/config.c should be pointless, as modifications to the configuration should only occur while all threads are paused. The rest makes sort of sense, but I'd leave it to Howard.
Ignoring the ITS#5403 changes, I don't see anything here that isn't config-related, therefore it's all running single-threaded.
Now that the configuration can be changed dynamically (as this test does) I find it a bit odd that the config stuff should always be running single-threaded. But there is obviously much I don't know about the internals of slapd.
This was discussed at the 2003 OpenLDAP Developers' Day. http://www.openldap.org/conf/odd-wien-2003/proceedings.html Page 7 of my slides touched on it.
It's also mentioned again here http://www.openldap.org/lists/openldap-devel/200505/msg00062.html
The main point is that the original config stuff assumed that it was only executing during slapd startup, and single-threaded. Putting locks around all of the potential configurable value accesses would have been too much work, so the decision was made to force slapd to be single-threaded whenever writing to the config.
I'm guessing that what's happening here is just a broken assumption - suspending the thread pool doesn't in fact freeze everything in slapd. In particular, the slapd listener thread may still wake up for a select() timeout to handle the slapd runqueue. Even though no tasks submitted as a result of this will execute (because they'll just get queued into the suspended thread pool) the runqueue itself is still being manipulated.
Given that explanation, your patches make sense.