Re: slapd deadlock bug ITS#7296

2 Jan 2013


      On Sun, 23 Dec 2012, Howard Chu wrote:
...
Richard Silverman wrote:
...
I’ve done this, and it may well be epoll-specific; the test has now run
over twice as long as the longest it has ever required to produce the
deadlock. With this sort of bug no amount of waiting would make me sure,
but it seems likely. I’ll leave it running.
epoll(7) specifically mentions the possibility of epoll_wait hanging even
though there is outstanding unread data on a socket, when using
edge-triggered operation, and I notice in daemon.c that you switch to
edge-triggered mode in the event that the client closes the connection (at
least that’s what the comment suggests):
  /* Don't keep reporting the hangup
   */
  if ( SLAP_SOCK_IS_ACTIVE( tid, fd )) {
      SLAP_EPOLL_SOCK_SET( tid, fd, EPOLLET );
  }


Perhaps related?
Indeed. Seems like ITS#5886 has resurfaced. You can get some insight into 
this looking at about commit 96192064f3a3daea994eb8293f0413def5379958 
forward. I don't have time to dig further into it at the moment, Christmas 
dinner(s) calling...
I hope you're enjoying your holidays. I've also been on vacation, but I
will follow up on this when I can, probably next week. In the meantime, I
thought I'd report that I let my test with select() instead of epoll() run
for several more hours with no deadlock, for whatever that's worth.
...
...
...
Also, what kernel version(s) are you testing on?
2.6.32 (Red Hat Enterprise Linux)
I see a lot of Linux-kernel email traffic about epoll bugs as well, but not 
sure which are relevant to this version. Just something to stay aware of.
Good to know, although I think the bug is independent of this. We actually
ran into this exact issue on Solaris several years ago and reported it
then:
http://www.openldap.org/its/index.cgi/Incoming?id=6920
... but the problem was misidentified and we didn't follow up further. At
the time the bug manifested very infrequently for us and we just put
monitoring in place which restarted slapd whenever it happened. By the
time we migrated to Linux we had forgotten all about this, so we didn't
migrate the watchdog, and it turned out that with the changed environment
the bug was much more severe this time around, which prompted us to
investigate more fully.
-- 
   Richard

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: slapd deadlock bug ITS#7296