We seem to be getting errors every night a couple minutes after logrotate rotates our logs and sends a SIGHUP to syslog-ng (to force a reload):
Jul 11 04:02:46 csenet slapd[8823]: daemon: 1024 beyond descriptor table size 1024
Nothing is touching our slapd process (i.e., same process over several days.)
This seems only to happen on our master LDAP server. We're using slurpd for replication to our two slave servers.
This morning, something apparently corrupted our directory, which apparently got replicated to our slaves; we restored the db from the nightly dump (made from slapcat on another replica) and LDAP seems happy again.
We can't see anything in the logs that would lend a clue as to what might be going on. Any suggestions as to where I should start looking?
We're running RHEL 4 with all updates applied, using RH's openldap packages (2.2.13).
Looking back in the logs, it seems that the syslog message above occurs for a couple minutes after syslog-ng is restarted, and then stops occurring until the next time syslog-ng is restarted, but it's apparently been happening for quite a while. Today is the first time we've had corruption (or otherwise total failure) of the LDAP directory, though.
Any suggestions or help will be greatly appreciated.
Gregory
--On Friday, July 13, 2007 1:24 PM -0700 "Gregory K. Ruiz-Ade" gkra@cs.ucsd.edu wrote:
Any suggestions or help will be greatly appreciated.
Increase your file descriptor limit
Drop redhat's extremely old broken build.
--Quanah
-- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
On Jul 13, 2007, at 6:17 PM, Quanah Gibson-Mount wrote:
--On Friday, July 13, 2007 1:24 PM -0700 "Gregory K. Ruiz-Ade" gkra@cs.ucsd.edu wrote:
Any suggestions or help will be greatly appreciated.
Increase your file descriptor limit
Drop redhat's extremely old broken build.
Oh, god... I feel stupid for having asked.
Regarding the software version, we're slating an upgrade from RHEL4 to RHEL5 for the LDAP servers to at least get us up to the 2.3 branch of OpenLDAP, which will at least be a step in the right direction.
Thanks!
Gregory
--On Friday, July 13, 2007 6:48 PM -0700 "Gregory K. Ruiz-Ade" gkra@cs.ucsd.edu wrote:
On Jul 13, 2007, at 6:17 PM, Quanah Gibson-Mount wrote:
--On Friday, July 13, 2007 1:24 PM -0700 "Gregory K. Ruiz-Ade" gkra@cs.ucsd.edu wrote:
Any suggestions or help will be greatly appreciated.
Increase your file descriptor limit
Drop redhat's extremely old broken build.
Oh, god... I feel stupid for having asked.
Regarding the software version, we're slating an upgrade from RHEL4 to RHEL5 for the LDAP servers to at least get us up to the 2.3 branch of OpenLDAP, which will at least be a step in the right direction.
Sadly, using the builds provided by redhat, which are really geared for making the libraries available to other software, and not for running OpenLDAP as a server, is a quick path to directory suicide. I'd suggest using Symas Corp.'s free CDS build http://www.symas.com or Buchan Milne's excellent RPMs (which I unfortunately have not memorized the URL for, but you can probably find it in the list archives).
--Quanah
-- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Gregory K. Ruiz-Ade skrev, on 13-07-2007 22:24:
We seem to be getting errors every night a couple minutes after logrotate rotates our logs and sends a SIGHUP to syslog-ng (to force a reload):
Jul 11 04:02:46 csenet slapd[8823]: daemon: 1024 beyond descriptor table size 1024
Nothing is touching our slapd process (i.e., same process over several days.)
This seems only to happen on our master LDAP server. We're using slurpd for replication to our two slave servers.
This morning, something apparently corrupted our directory, which apparently got replicated to our slaves; we restored the db from the nightly dump (made from slapcat on another replica) and LDAP seems happy again.
We can't see anything in the logs that would lend a clue as to what might be going on. Any suggestions as to where I should start looking?
We're running RHEL 4 with all updates applied, using RH's openldap packages (2.2.13).
Looking back in the logs, it seems that the syslog message above occurs for a couple minutes after syslog-ng is restarted, and then stops occurring until the next time syslog-ng is restarted, but it's apparently been happening for quite a while. Today is the first time we've had corruption (or otherwise total failure) of the LDAP directory, though.
Any suggestions or help will be greatly appreciated.
We run RHAS4, with a new (IBM iron Opteron) RHL5 Server machine soon to be deployed. I also run a home Fedora FC6 rig with the same setups and API software specs.
All work fine, no problems.
We run syslog-ng 1.6.8. We run Buchan Milne's OpenLDAP 2.3 version 2.3.36, using his built-in BDB 4.2.52 support on RHL5 and FC6, with my own BDB 4.2.52 libraries on RHAS4. Everything works fine on all machines.
Kudos to Buchan:
2 surmises:
1: RHL5 and FC6 both have BDB 4.3 as standard; Buchan's srpm (and, believe me, I refuse to install *ANY* software without it being available as an rpm. If it's not, I bake my own - but Buchan's srpm is far superior to anything I could bake myself) is "intelligent" enough to see that Red Hat has given me an unstable version of BDB and substitute a stable version - 5-patched 4.2.52 for his OpenLDAP alone - all the other RH stuff continues to use 4.3;
2: I'm a die-hard Red Hat/CentOS person, but although those give me unparalleled stability and mostly update without preference, OpenLDAP is a great exception. I would not touch RHAS4/CentOS 5's OpenLDAP with a barge pole. Fedora FC6/7 is a possible exception because it's largely up to date, but since Buchan's stuff gives me so much modular configurability, I'll stick with him for FC as well.
Best,
--Tonni
--On Sunday, July 15, 2007 8:33 AM +0200 Tony Earnshaw tonni@hetnet.nl wrote:
Kudos to Buchan:
2 surmises:
1: RHL5 and FC6 both have BDB 4.3 as standard; Buchan's srpm (and, believe me, I refuse to install *ANY* software without it being available as an rpm. If it's not, I bake my own - but Buchan's srpm is far superior to anything I could bake myself) is "intelligent" enough to see that Red Hat has given me an unstable version of BDB and substitute a stable version - 5-patched 4.2.52 for his OpenLDAP alone - all the other RH stuff continues to use 4.3;
Only 5 patches for 4.2.52? There's a rather important 6th one...
--Quanah
-- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Quanah Gibson-Mount skrev, on 15-07-2007 23:45:
Kudos to Buchan:
2 surmises:
1: RHL5 and FC6 both have BDB 4.3 as standard; Buchan's srpm (and, believe me, I refuse to install *ANY* software without it being available as an rpm. If it's not, I bake my own - but Buchan's srpm is far superior to anything I could bake myself) is "intelligent" enough to see that Red Hat has given me an unstable version of BDB and substitute a stable version - 5-patched 4.2.52 for his OpenLDAP alone - all the other RH stuff continues to use 4.3;
Only 5 patches for 4.2.52? There's a rather important 6th one...
They could be in line; actually there are only 4 if one discounts an x86_64 AMD mutex patch.
I trust Buchan to know what he's doing (most of the time ;) and I've not had problems running stuff built with his specs/srpms yet.
--Tonni
On 7/16/07, Tony Earnshaw tonni@hetnet.nl wrote:
Quanah Gibson-Mount skrev, on 15-07-2007 23:45:
Kudos to Buchan:
2 surmises:
1: RHL5 and FC6 both have BDB 4.3 as standard; Buchan's srpm (and, believe me, I refuse to install *ANY* software without it being available as an rpm. If it's not, I bake my own - but Buchan's srpm is far superior to anything I could bake myself) is "intelligent" enough to see that Red Hat has given me an unstable version of BDB and substitute a stable version - 5-patched 4.2.52 for his OpenLDAP alone - all the other RH stuff continues to use 4.3;
Only 5 patches for 4.2.52? There's a rather important 6th one...
Which I was planning on adding over the past weekend ... but my house was burgled and my laptop stolen ... which is why I've been a bit out of things for the past week or so.
I'll be updating quite soon ...
Regards, Buchan
Buchan Milne skrev, on 19-07-2007 14:50:
[...]
Only 5 patches for 4.2.52? There's a rather important 6th one...
Which I was planning on adding over the past weekend ... but my house was burgled and my laptop stolen ... which is why I've been a bit out of things for the past week or so.
I'll be updating quite soon ...
Is updated, the new url as given to me by Buchan is http://staff.telkomsa.net/packages/.
The openldap2.3-2.3.36-1.rhel5.src.rpm builds beautifully, with a couple of '%define's added to the spec, on x86_64 rhl5, and everything works perfectly after the necessary rpms are installed. The extra defines were necessary, since x86_64 rhl5 builds can't find the ones included in /usr/lib/rpm.
Which is nice, since there were a couple of very important thing that the rpms built with the RH srpm spec didn't do:
- It wouldn't create or write to cn=config. - The standard bdb on rhl5/CentOS 5 is 4.3.29. There was no way I could get it to work without bdb 4.3.29 support (RH attempts to offer 4.4.20 support, but that doesn't work), while Buchan's uses separate patched 4.2.52 support which does - far preferable. - I had to change much in the RH spec and install much of Buchan's configuration stuff to get the RH-spec-built OL 2.3.36 working at all.
And, the RH slapd is monolithic, while Buchan's uses dynamic modules.
Best,
--Tonni
I tried to do some searching, but didn't find anything.
Could someone point me to a URL where I can find Buchan Milne's RPMs? :)
Thanks,
Gregory
On Jul 14, 2007, at 11:33 PM, Tony Earnshaw wrote:
We run syslog-ng 1.6.8. We run Buchan Milne's OpenLDAP 2.3 version 2.3.36, using his built-in BDB 4.2.52 support on RHL5 and FC6, with my own BDB 4.2.52 libraries on RHAS4. Everything works fine on all machines.
http://rpm.pbone.net/index.php3/stat/15/limit/7/dl/40/pakman/2376/com/Buchan...
On 7/15/07, Gregory K. Ruiz-Ade gkra@cs.ucsd.edu wrote:
I tried to do some searching, but didn't find anything.
Could someone point me to a URL where I can find Buchan Milne's RPMs? :)
Thanks,
Gregory
On Jul 14, 2007, at 11:33 PM, Tony Earnshaw wrote:
We run syslog-ng 1.6.8. We run Buchan Milne's OpenLDAP 2.3 version 2.3.36, using his built-in BDB 4.2.52 support on RHL5 and FC6, with my own BDB 4.2.52 libraries on RHAS4. Everything works fine on all machines.
-- Gregory K. Ruiz-Ade Sr. Systems Administrator Computer Science and Engineering University of California, San Diego Office: EBU3b 1216 Phone: (858) 822-2625 E-mail: gkra@cs.ucsd.edu
Newzenca skrev, on 16-07-2007 04:04:
http://rpm.pbone.net/index.php3/stat/15/limit/7/dl/40/pakman/2376/com/Buchan... http://rpm.pbone.net/index.php3/stat/15/limit/7/dl/40/pakman/2376/com/Buchan%20Milne%20%3Cbgmilne_linux-mandrake_com%3E.html
Hmmm ... perhaps a better site for Red Hat and Fedora people for OL 2.3 is:
http://anorien.warwick.ac.uk/mirrors/buchan/openldap/
Caveats:
1: At present, any (RH) one using the srpm has to download and install an extra set of macros in /etc/rpm. At he head of the spec file is the url to the macro. Buchan has reported here that he's working on this to define the macros in line.
2: I've successfully built x86_32 RHAS4, RHL5 and FC6 OL 2.3.36 source with the 2.3.24 spec. I've had trouble with, and am working on, the spec for x86_64 RHL5. Things have to be working by July 20th ...
--Tonni
On 7/15/07, *Gregory K. Ruiz-Ade* <gkra@cs.ucsd.edu mailto:gkra@cs.ucsd.edu> wrote:
I tried to do some searching, but didn't find anything. Could someone point me to a URL where I can find Buchan Milne's RPMs? :) Thanks, Gregory On Jul 14, 2007, at 11:33 PM, Tony Earnshaw wrote: > We run syslog-ng 1.6.8. We run Buchan Milne's OpenLDAP 2.3 version > 2.3.36, using his built-in BDB 4.2.52 support on RHL5 and FC6, with > my own BDB 4.2.52 libraries on RHAS4. Everything works fine on all > machines. -- Gregory K. Ruiz-Ade Sr. Systems Administrator Computer Science and Engineering University of California, San Diego Office: EBU3b 1216 Phone: (858) 822-2625 E-mail: gkra@cs.ucsd.edu <mailto:gkra@cs.ucsd.edu>
Tony Earnshaw wrote:
Hmmm ... perhaps a better site for Red Hat and Fedora people for OL 2.3 is:
Thanks everyone for the help.
I've installed the packages from the yum repository listed above, and migrated the directory, and everything seems to have survived the night.
Near as I can figure, the 2.2.x version from RH was stuffing its head in the sand when getting on the order of a couple thousand queries at once when various services would be restarted across all our RH servers all at the same time, thanks to cron.daily. The servers would then accept new connections, but never do anything (i.e., stall them out), which meant the clients had no obvious error to trigger a failover to another server, but they also never got any useful data.
In addition to upgrading to the 2.3 server and setting the file limit to 4096, we also staggered the timing of cron.daily on about half of our systems, just to be safe.
Gregory
Funny, I was just discussing OpenLDAP 2.2 vs. 2.3. 2.2 is funky compared to 2.3, so enjoy the upgrade. What always gets me is that so many distributions (major distros) use 2.2 or older versions, so that means 2.2 gets used in a lot of places.
-- Puryear IT, LLC Identity Management, Directory Services, Systems Integration Baton Rouge, LA * 225-706-8414 * http://www.puryear-it.com
"Best Practices for Managing Linux and UNIX Servers" http://www.puryear-it.com/pubs/linux-unix-best-practices
Gregory K. Ruiz-Ade wrote:
Tony Earnshaw wrote:
Hmmm ... perhaps a better site for Red Hat and Fedora people for OL 2.3 is:
Thanks everyone for the help.
I've installed the packages from the yum repository listed above, and migrated the directory, and everything seems to have survived the night.
Near as I can figure, the 2.2.x version from RH was stuffing its head in the sand when getting on the order of a couple thousand queries at once when various services would be restarted across all our RH servers all at the same time, thanks to cron.daily. The servers would then accept new connections, but never do anything (i.e., stall them out), which meant the clients had no obvious error to trigger a failover to another server, but they also never got any useful data.
In addition to upgrading to the 2.3 server and setting the file limit to 4096, we also staggered the timing of cron.daily on about half of our systems, just to be safe.
Gregory
Dustin Puryear skrev, on 17-07-2007 22:37:
Funny, I was just discussing OpenLDAP 2.2 vs. 2.3. 2.2 is funky compared to 2.3, so enjoy the upgrade. What always gets me is that so many distributions (major distros) use 2.2 or older versions, so that means 2.2 gets used in a lot of places.
I am intending to do a write-up on this; Red Hat issues OL 2.3.27 for rhl5, this means that it also supplies an srpm.
I'm installing a series of rhl5 IBM x86_64 machines with OL and, as always, I want to keep up with the latest. Buchan's spec/srpm won't work (for me, at any rate) with x86_64 libraries, and the spec's so complicated, that rewriting it would be a real pain. So I tried the RH srpm with OL 2.3.36 source code and that works - but only after altering the spec quite a lot and redefining rpm macros and such. The RH spec is a lot simpler than Buchan's and things it does wrong are easier to correct.
When I'm sure that everything really works in production and is stable, I'll put the stuff on my ftp server for others to try out.
--Tonni
openldap-software@openldap.org