On Mon, Oct 20, 2014 at 5:51 AM, Howard Chu hyc@symas.com wrote:
basically i suspect a severe bug in the linux kernel which these extreme circumstances (32 or 16 processes accessing the same mmap'd file for example) have never been encountered, so the bug is simply... unknown.
I've occasionally seen the system lockup as well; I'm pretty sure it's due to dirty page writeback. There are plenty of other references on the web about Linux dealing poorly with heavy writeback I/O.
ugh. that doesn't bode well for the future reliability of the system i've just been working on. i modified parabench.py yesterday (sent you and david a copy) and managed to completely hang my lovely precious macbook running debian amd64.
also i managed to get processes to deadlock when being asked to die. killing them off -9 one by one often at some point they would all properly exit, but until the one that was deadlocked was found the parent would simply sit there.
is anyone actually dealing with this actively [poor page writeback] in the linux kernel?
l.