Hi,
Not sure if I should post this here or with the CentOS mailing list (I am hoping
they are monitoring this). I am using a stock CentOS 6.3 32-bit installation
with
# rpm -qa | grep openldap
openldap-devel-2.4.23-26.el6_3.2.i686
openldap-2.4.23-26.el6_3.2.i686
openldap-clients-2.4.23-26.el6_3.2.i686
openldap-servers-2.4.23-26.el6_3.2.i686
I have a 4-way multi-master sync replication set up on four virtual servers
using Citrix XenServer 6.2. I am also running Samba 3.5.10 as a PDC on one
machine and BDC on the other three. All servers are also running sssd-1.8.0 for
the Linux authentication.
The problem is that one or more of the LDAP servers will hang, usually the one
that acts as the PDC, since this is hit the hardest and is the more critical of
the four. Usually but not always the "hang" will trickle to the other
servers--usually when I am not watching during the middle of the night.
Fortunately we are not yet in full production.
Compiling from source is not yet an option. I must use the stock RPMs from
CentOS per our design guidelines.
LDAP will appear to hang but what appears to be happening is that only the
listener becomes busy and will not get out this state. Here is a short pull of
the logs that I am collecting:
Aug 14 20:34:44 auth-us slapd[10357]: daemon: read active on 69
Aug 14 20:34:44 auth-us slapd[10357]: daemon: epoll: listen=7 active_threads=0
tvp=zero
Aug 14 20:34:44 auth-us slapd[10357]: daemon: epoll: listen=8 active_threads=0
tvp=zero
Aug 14 20:34:44 auth-us slapd[10357]: conn=1742 op=0 EXT
oid=1.3.6.1.4.1.1466.20037
Aug 14 20:34:44 auth-us slapd[10357]: conn=1742 op=0 STARTTLS
Aug 14 20:34:44 auth-us slapd[10357]: conn=1742 op=0 RESULT oid= err=0 text=
Aug 14 20:34:44 auth-us slapd[10357]: daemon: activity on 1 descriptor
Aug 14 20:34:44 auth-us slapd[10357]: daemon: activity on:
Aug 14 20:34:44 auth-us slapd[10357]:
Aug 14 20:34:44 auth-us slapd[10357]: daemon: epoll: listen=7 active_threads=0
tvp=zero
Aug 14 20:34:44 auth-us slapd[10357]: daemon: epoll: listen=8 active_threads=0
tvp=zero
Aug 14 20:34:44 auth-us slapd[10357]: daemon: activity on 1 descriptor
Aug 14 20:34:44 auth-us slapd[10357]: daemon: activity on:
Aug 14 20:34:44 auth-us slapd[10357]: 69r
Aug 14 20:34:44 auth-us slapd[10357]:
Aug 14 20:34:44 auth-us slapd[10357]: daemon: read active on 69
Aug 14 20:34:44 auth-us slapd[10357]: daemon: epoll: listen=7 active_threads=0
tvp=zero
Aug 14 20:34:44 auth-us slapd[10357]: daemon: epoll: listen=8 active_threads=0
tvp=zero
Aug 14 20:34:46 auth-us slapd[10357]: daemon: epoll: listen=7 active_threads=0
tvp=zero
Aug 14 20:34:46 auth-us slapd[10357]: daemon: epoll: listen=8 active_threads=0
tvp=zero
Aug 14 20:34:51 auth-us slapd[10357]: daemon: epoll: listen=7 active_threads=0
tvp=zero
Aug 14 20:34:51 auth-us slapd[10357]: daemon: epoll: listen=8 active_threads=0
tvp=zero
Aug 14 20:34:54 auth-us slapd[10357]: daemon: activity on 1 descriptor
Aug 14 20:34:54 auth-us slapd[10357]: daemon: activity on:
Aug 14 20:34:54 auth-us slapd[10357]: 39r
Aug 14 20:34:54 auth-us slapd[10357]:
Aug 14 20:34:54 auth-us slapd[10357]: daemon: read active on 39
Aug 14 20:34:54 auth-us slapd[10357]: daemon: epoll: listen=7 active_threads=0
tvp=zero
Aug 14 20:34:54 auth-us slapd[10357]: daemon: epoll: listen=8 active_threads=0
tvp=zero
Aug 14 20:34:54 auth-us slapd[10357]: daemon: activity on 1 descriptor
Aug 14 20:34:54 auth-us slapd[10357]: daemon: activity on:
Aug 14 20:34:54 auth-us slapd[10357]:
Aug 14 20:34:54 auth-us slapd[10357]: daemon: epoll: listen=7 busy
Aug 14 20:34:54 auth-us slapd[10357]: daemon: epoll: listen=8 active_threads=0
tvp=zero
Aug 14 20:34:56 auth-us slapd[10357]: daemon: epoll: listen=7 busy
Aug 14 20:34:56 auth-us slapd[10357]: daemon: epoll: listen=8 active_threads=0
tvp=zero
Aug 14 20:35:01 auth-us slapd[10357]: daemon: epoll: listen=7 busy
Aug 14 20:35:01 auth-us slapd[10357]: daemon: epoll: listen=8 active_threads=0
tvp=zero
Aug 14 20:35:06 auth-us slapd[10357]: daemon: epoll: listen=7 busy
Aug 14 20:35:06 auth-us slapd[10357]: daemon: epoll: listen=8 active_threads=0
tvp=zero
Aug 14 20:35:11 auth-us slapd[10357]: daemon: epoll: listen=7 busy
Aug 14 20:35:11 auth-us slapd[10357]: daemon: epoll: listen=8 active_threads=0
tvp=zero
Aug 14 20:35:12 auth-us slapd[10357]: daemon: activity on 1 descriptor
Aug 14 20:35:12 auth-us slapd[10357]: daemon: activity on:
Aug 14 20:35:12 auth-us slapd[10357]: 42r
Aug 14 20:35:12 auth-us slapd[10357]:
Aug 14 20:35:12 auth-us slapd[10357]: daemon: read active on 42
Aug 14 20:35:12 auth-us slapd[10357]: daemon: epoll: listen=7 busy
Aug 14 20:35:12 auth-us slapd[10357]: daemon: epoll: listen=8 active_threads=0
tvp=zero
Aug 14 20:35:14 auth-us slapd[10357]: daemon: activity on 1 descriptor
Aug 14 20:35:14 auth-us slapd[10357]: daemon: activity on:
Aug 14 20:35:14 auth-us slapd[10357]: 40r
Aug 14 20:35:14 auth-us slapd[10357]:
Aug 14 20:35:14 auth-us slapd[10357]: daemon: read active on 40
Aug 14 20:35:14 auth-us slapd[10357]: daemon: epoll: listen=7 busy
Aug 14 20:35:14 auth-us slapd[10357]: daemon: epoll: listen=8 active_threads=0
tvp=zero
Aug 14 20:35:16 auth-us slapd[10357]: daemon: epoll: listen=7 busy
Aug 14 20:35:16 auth-us slapd[10357]: daemon: epoll: listen=8 active_threads=0
tvp=zero
Aug 14 20:35:21 auth-us slapd[10357]: daemon: epoll: listen=7 busy
Aug 14 20:35:21 auth-us slapd[10357]: daemon: epoll: listen=8 active_threads=0
tvp=zero
Aug 14 20:35:26 auth-us slapd[10357]: daemon: epoll: listen=7 busy
Aug 14 20:35:26 auth-us slapd[10357]: daemon: epoll: listen=8 active_threads=0
tvp=zero
Aug 14 20:35:31 auth-us slapd[10357]: daemon: epoll: listen=7 busy
Aug 14 20:35:31 auth-us slapd[10357]: daemon: epoll: listen=8 active_threads=0
tvp=zero
Every log entry prior to this looks normal in that epoll: listen=7 goes between
active_threads=0 to busy when a connection comes in, sets up the connection, and
then goes back to active_threads=0. I have yet to understand what is going on to
cause it to go into the busy state and never to return until I manually stop and
restart the slapd process. It does appear however that slapd can still process
any queries on active connections as noted on descriptors 40r and 42r--it just
can't process any new connection requests as epoll: listen=7 has hung.
Looking through the archives this problem still appears to be present in a few
additional later releases but I have not found any thread yet which points to a
specific solution or patch that fixes this problem. Unless I can point to a
specific solution and/or patch I won't get approval to do a pull from the latest
source and compile--I'll have to stick with an hourly cron job that stops and
restart slapd.
Thanks,
Bob Smith
--bs