Hi,
I just wanted to report back on this issue.
I downloaded the source RPMS from abennet's blog found at:
http://wordpress.clarku.edu/abennett/
While this was only version openldap-2.4.30 this seemed to me a more expedient
approach as I would have to update about a dozen machines. I was able to
successfully compile and rebuild the SRPM on my CentOS 6.3 64-bit platform and
replace the core Openldap software and DB software.
While the updated software has only been running for three days in a 4-way
multi-master sync replication configuration, it has run without the epoll: busy
problems that were hanging slapd (2.4.23) at least once a day. It also seems to
have resolved a few lingering replication issues that I was experiencing with
the CentOS core Openldap software.
There are a few quirks with installing this software (didn't like self-signed
certs for one) but no show stoppers for me.
It would be nice if Rehat or CentOS could get the fix mentioned below into their
core Openldap.
Thanks for your help.
Bob
--bs
On August 14, 2012 at 8:15 PM Howard Chu <hyc(a)symas.com> wrote:
rwsmith(a)bislink.net wrote:
>
>
> Hi,
>
>
>
> Not sure if I should post this here or with the CentOS mailing list (I am
> hoping they are monitoring this). I am using a stock CentOS 6.3 32-bit
> installation with
>
>
>
> # rpm -qa | grep openldap
> openldap-devel-2.4.23-26.el6_3.2.i686
> openldap-2.4.23-26.el6_3.2.i686
> openldap-clients-2.4.23-26.el6_3.2.i686
> openldap-servers-2.4.23-26.el6_3.2.i686
>
>
>
> I have a 4-way multi-master sync replication set up on four virtual servers
> using Citrix XenServer 6.2. I am also running Samba 3.5.10 as a PDC on one
> machine and BDC on the other three. All servers are also running sssd-1.8.0
> for the Linux authentication.
>
>
>
> The problem is that one or more of the LDAP servers will hang, usually the
> one
> that acts as the PDC, since this is hit the hardest and is the more critical
> of the four. Usually but not always the "hang" will trickle to the other
> servers--usually when I am not watching during the middle of the night.
> Fortunately we are not yet in full production.
>
>
>
> Compiling from source is not yet an option. I must use the stock RPMs from
> CentOS per our design guidelines.
>
>
>
> LDAP will appear to hang but what appears to be happening is that only the
> listener becomes busy and will not get out this state. Here is a short pull
> of
> the logs that I am collecting:
Sounds like this was fixed in 2.4.25, git commit ID
0ae659ad87c64bef938f729e87573ff3ce04bd28 (master),
commit a3f40e5601c0c522f2bda418374fb415bdcbd75c (release).
There was no ITS submitted for this change, so it is not in the CHANGES file.
If you can reproduce the problem with 2.4.32 please submit an ITS for it. I
will note that all of this listener code is targeted for a major rewrite in
OpenLDAP 2.5.
> Aug 14 20:34:44 auth-us slapd[10357]: daemon: read active on 69
> Aug 14 20:34:44 auth-us slapd[10357]: daemon: epoll: listen=7
> active_threads=0
> tvp=zero
> Aug 14 20:34:44 auth-us slapd[10357]: daemon: epoll: listen=8
> active_threads=0
> tvp=zero
> Aug 14 20:34:44 auth-us slapd[10357]: conn=1742 op=0 EXT
> oid=1.3.6.1.4.1.1466.20037
> Aug 14 20:34:44 auth-us slapd[10357]: conn=1742 op=0 STARTTLS
> Aug 14 20:34:44 auth-us slapd[10357]: conn=1742 op=0 RESULT oid= err=0 text=
> Aug 14 20:34:44 auth-us slapd[10357]: daemon: activity on 1 descriptor
> Aug 14 20:34:44 auth-us slapd[10357]: daemon: activity on:
> Aug 14 20:34:44 auth-us slapd[10357]:
> Aug 14 20:34:44 auth-us slapd[10357]: daemon: epoll: listen=7
> active_threads=0
> tvp=zero
> Aug 14 20:34:44 auth-us slapd[10357]: daemon: epoll: listen=8
> active_threads=0
> tvp=zero
> Aug 14 20:34:44 auth-us slapd[10357]: daemon: activity on 1 descriptor
> Aug 14 20:34:44 auth-us slapd[10357]: daemon: activity on:
> Aug 14 20:34:44 auth-us slapd[10357]: 69r
> Aug 14 20:34:44 auth-us slapd[10357]:
> Aug 14 20:34:44 auth-us slapd[10357]: daemon: read active on 69
> Aug 14 20:34:44 auth-us slapd[10357]: daemon: epoll: listen=7
> active_threads=0
> tvp=zero
> Aug 14 20:34:44 auth-us slapd[10357]: daemon: epoll: listen=8
> active_threads=0
> tvp=zero
> Aug 14 20:34:46 auth-us slapd[10357]: daemon: epoll: listen=7
> active_threads=0
> tvp=zero
> Aug 14 20:34:46 auth-us slapd[10357]: daemon: epoll: listen=8
> active_threads=0
> tvp=zero
> Aug 14 20:34:51 auth-us slapd[10357]: daemon: epoll: listen=7
> active_threads=0
> tvp=zero
> Aug 14 20:34:51 auth-us slapd[10357]: daemon: epoll: listen=8
> active_threads=0
> tvp=zero
> Aug 14 20:34:54 auth-us slapd[10357]: daemon: activity on 1 descriptor
> Aug 14 20:34:54 auth-us slapd[10357]: daemon: activity on:
> Aug 14 20:34:54 auth-us slapd[10357]: 39r
> Aug 14 20:34:54 auth-us slapd[10357]:
> Aug 14 20:34:54 auth-us slapd[10357]: daemon: read active on 39
> Aug 14 20:34:54 auth-us slapd[10357]: daemon: epoll: listen=7
> active_threads=0
> tvp=zero
> Aug 14 20:34:54 auth-us slapd[10357]: daemon: epoll: listen=8
> active_threads=0
> tvp=zero
> Aug 14 20:34:54 auth-us slapd[10357]: daemon: activity on 1 descriptor
> Aug 14 20:34:54 auth-us slapd[10357]: daemon: activity on:
> Aug 14 20:34:54 auth-us slapd[10357]:
> Aug 14 20:34:54 auth-us slapd[10357]: daemon: epoll: listen=7 busy
> Aug 14 20:34:54 auth-us slapd[10357]: daemon: epoll: listen=8
> active_threads=0
> tvp=zero
> Aug 14 20:34:56 auth-us slapd[10357]: daemon: epoll: listen=7 busy
> Aug 14 20:34:56 auth-us slapd[10357]: daemon: epoll: listen=8
> active_threads=0
> tvp=zero
> Aug 14 20:35:01 auth-us slapd[10357]: daemon: epoll: listen=7 busy
> Aug 14 20:35:01 auth-us slapd[10357]: daemon: epoll: listen=8
> active_threads=0
> tvp=zero
> Aug 14 20:35:06 auth-us slapd[10357]: daemon: epoll: listen=7 busy
> Aug 14 20:35:06 auth-us slapd[10357]: daemon: epoll: listen=8
> active_threads=0
> tvp=zero
> Aug 14 20:35:11 auth-us slapd[10357]: daemon: epoll: listen=7 busy
> Aug 14 20:35:11 auth-us slapd[10357]: daemon: epoll: listen=8
> active_threads=0
> tvp=zero
> Aug 14 20:35:12 auth-us slapd[10357]: daemon: activity on 1 descriptor
> Aug 14 20:35:12 auth-us slapd[10357]: daemon: activity on:
>
> Aug 14 20:35:12 auth-us slapd[10357]: 42r
> Aug 14 20:35:12 auth-us slapd[10357]:
> Aug 14 20:35:12 auth-us slapd[10357]: daemon: read active on 42
> Aug 14 20:35:12 auth-us slapd[10357]: daemon: epoll: listen=7 busy
> Aug 14 20:35:12 auth-us slapd[10357]: daemon: epoll: listen=8
> active_threads=0
> tvp=zero
> Aug 14 20:35:14 auth-us slapd[10357]: daemon: activity on 1 descriptor
> Aug 14 20:35:14 auth-us slapd[10357]: daemon: activity on:
> Aug 14 20:35:14 auth-us slapd[10357]: 40r
> Aug 14 20:35:14 auth-us slapd[10357]:
> Aug 14 20:35:14 auth-us slapd[10357]: daemon: read active on 40
> Aug 14 20:35:14 auth-us slapd[10357]: daemon: epoll: listen=7 busy
> Aug 14 20:35:14 auth-us slapd[10357]: daemon: epoll: listen=8
> active_threads=0
> tvp=zero
> Aug 14 20:35:16 auth-us slapd[10357]: daemon: epoll: listen=7 busy
> Aug 14 20:35:16 auth-us slapd[10357]: daemon: epoll: listen=8
> active_threads=0
> tvp=zero
> Aug 14 20:35:21 auth-us slapd[10357]: daemon: epoll: listen=7 busy
> Aug 14 20:35:21 auth-us slapd[10357]: daemon: epoll: listen=8
> active_threads=0
> tvp=zero
> Aug 14 20:35:26 auth-us slapd[10357]: daemon: epoll: listen=7 busy
> Aug 14 20:35:26 auth-us slapd[10357]: daemon: epoll: listen=8
> active_threads=0
> tvp=zero
> Aug 14 20:35:31 auth-us slapd[10357]: daemon: epoll: listen=7 busy
> Aug 14 20:35:31 auth-us slapd[10357]: daemon: epoll: listen=8
> active_threads=0
> tvp=zero
>
>
>
> Every log entry prior to this looks normal in that epoll: listen=7 goes
> between active_threads=0 to busy when a connection comes in, sets up the
> connection, and then goes back to active_threads=0. I have yet to understand
> what is going on to cause it to go into the busy state and never to return
> until I manually stop and restart the slapd process. It does appear however
> that slapd can still process any queries on active connections as noted on
> descriptors 40r and 42r--it just can't process any new connection requests
> as
> epoll: listen=7 has hung.
>
>
>
> Looking through the archives this problem still appears to be present in a
> few
> additional later releases but I have not found any thread yet which points
> to
> a specific solution or patch that fixes this problem. Unless I can point to
> a
> specific solution and/or patch I won't get approval to do a pull from the
> latest source and compile--I'll have to stick with an hourly cron job that
> stops and restart slapd.
>
>
>
> Thanks,
>
> Bob Smith
>
> --bs
>
>
>
>
>
>
>
>
>
>
>
>
>
--
-- Howard Chu
CTO, Symas Corp.
http://www.symas.com
Director, Highland Sun
http://highlandsun.com/hyc/
Chief Architect, OpenLDAP
http://www.openldap.org/project/