I have been fighting with this issue for a couple months now and I really need a solution.
I have 2 openldap servers recently upgraded to 2.3.38 with a brand new rebuilt bdb from an LDIF dump. The 2 servers sit behind a load balancer (read-only) and provide basic authentication for about 300 linux servers. There's not much traffic on them but those who need access need access.
The problem is they stop returning data, slapd is still running otherwise seems ok. You can still bind to them using rootdn with no issues. I found an old thread describing a similar problem that suggested an upgrade which I did. I was using 2.2.13 now upgraded to 2.3.38
My level of knowledge of OpenLDAP is probably just above novice so I don't have a good base for trouble shooting.
This is causing HUGE disruption and needs to be fixed immediately so any and all help is much appreciated.
I turned on debug logging (-s 1) this morning so should have a bit of data to share with you if need be.
Thanks, Josh
--On Thursday, October 11, 2007 11:45 AM -0700 "Josh M. Hurd" JoshH@revenuescience.com wrote:
I have been fighting with this issue for a couple months now and I really need a solution.
I have 2 openldap servers recently upgraded to 2.3.38 with a brand new rebuilt bdb from an LDIF dump. The 2 servers sit behind a load balancer (read-only) and provide basic authentication for about 300 linux servers. There's not much traffic on them but those who need access need access.
Can you share your slapd.conf, minus passwords?
Is it slapd that stops responding to queries, or the load balancer? I.e., are you testing queries via the LB, or directly to slapd, when this happens?
Also, debug logging would be -d -1. -s is syslog level to use.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
slapd.conf:
# # See slapd.conf(5) for details on configuration options. # This file should NOT be world readable. # include /usr/local/openldap/etc/openldap/schema/core.schema include /usr/local/openldap/etc/openldap/schema/cosine.schema include /usr/local/openldap/etc/openldap/schema/ inetorgperson.schema include /usr/local/openldap/etc/openldap/schema/openldap.schema include /usr/local/openldap/etc/openldap/schema/nis.schema
# Define global ACLs to disable default read access.
# Do not enable referrals until AFTER you have a working directory # service AND an understanding of referrals. #referral ldap://root.openldap.org
pidfile /usr/local/openldap/var/run/slapd.pid argsfile /usr/local/openldap/var/run/slapd.args
# Load dynamic backend modules: modulepath /usr/local/openldap/libexec/openldap # moduleload back_bdb.la moduleload back_ldap.la moduleload back_ldbm.la # moduleload back_passwd.la # moduleload back_shell.la
# restrict userPassword for authentication only, allowing changes by user access to attrs=userPassword by self write by * auth
# allow the world read access access to * by * read
TLSCACertificateFile /etc/openldap/cacerts/cacert.pem TLSCertificateFile /etc/openldap/cacerts/replica.pem TLSCertificateKeyFile /etc/openldap/cacerts/replica.pem
####################################################################### # BDB database definitions #######################################################################
database bdb suffix "dc=domain,dc=net" rootdn "cn=admin,dc=domain,dc=net" rootpw secret # Mode 700 recommended. directory /usr/local/openldap/var/openldap-data # Indices to maintain index objectClass,uid,uidNumber,gidNumber,memberUid eq
#######################################################################
As for logging, when I added the -s 1 it seemed to be dumping the same type of info to syslog that it dumps to console when started with -d 1. Is this different?
On Oct 11, 2007, at 2:21 PM, Quanah Gibson-Mount wrote:
--On Thursday, October 11, 2007 11:45 AM -0700 "Josh M. Hurd" JoshH@revenuescience.com wrote:
I have been fighting with this issue for a couple months now and I really need a solution.
I have 2 openldap servers recently upgraded to 2.3.38 with a brand new rebuilt bdb from an LDIF dump. The 2 servers sit behind a load balancer (read-only) and provide basic authentication for about 300 linux servers. There's not much traffic on them but those who need access need access.
Can you share your slapd.conf, minus passwords?
Is it slapd that stops responding to queries, or the load balancer? I.e., are you testing queries via the LB, or directly to slapd, when this happens?
Also, debug logging would be -d -1. -s is syslog level to use.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc
Zimbra :: the leader in open source messaging and collaboration
If you're binding as the rootdn successfully, it's probably only that given bdb backend that's faulty. Can you track down the Sleepycat library you're using (perhaps using ldd), and find out what version it is?
At risk of my getting shot by Howard, are you caring for your Sleepycat log files properly? DB_CONFIGs for autoremove, or are there cron jobs for db_archive, etc.? What are your DB_CONFIGs, for that matter? (They'd have to be pretty far off to cause this behavior, but it's a quick look at least.)
--On Thursday, October 11, 2007 5:40 PM -0400 Aaron Richton richton@nbcs.rutgers.edu wrote:
If you're binding as the rootdn successfully, it's probably only that given bdb backend that's faulty. Can you track down the Sleepycat library you're using (perhaps using ldd), and find out what version it is?
At risk of my getting shot by Howard, are you caring for your Sleepycat log files properly? DB_CONFIGs for autoremove, or are there cron jobs for db_archive, etc.? What are your DB_CONFIGs, for that matter? (They'd have to be pretty far off to cause this behavior, but it's a quick look at least.)
I'd also ask, are you storing your OpenLDAP DB in NFS?
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
No NFS. All local
On Oct 11, 2007, at 2:44 PM, Quanah Gibson-Mount wrote:
--On Thursday, October 11, 2007 5:40 PM -0400 Aaron Richton richton@nbcs.rutgers.edu wrote:
If you're binding as the rootdn successfully, it's probably only that given bdb backend that's faulty. Can you track down the Sleepycat library you're using (perhaps using ldd), and find out what version it is?
At risk of my getting shot by Howard, are you caring for your Sleepycat log files properly? DB_CONFIGs for autoremove, or are there cron jobs for db_archive, etc.? What are your DB_CONFIGs, for that matter? (They'd have to be pretty far off to cause this behavior, but it's a quick look at least.)
I'd also ask, are you storing your OpenLDAP DB in NFS?
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc
Zimbra :: the leader in open source messaging and collaboration
I haven't found the version of BDB yet. Anyone know an easy way to do this?
My DB_CONFIG is basically the example that comes with OpenLDAP:
# $OpenLDAP: pkg/ldap/servers/slapd/DB_CONFIG,v 1.1.2.3 2006/08/17 17:36:19 kurt Exp $ # Example DB_CONFIG file for use with slapd(8) BDB/HDB databases. # # See Sleepycat Berkeley DB documentation # http://www.sleepycat.com/docs/ref/env/db_config.html # for detail description of DB_CONFIG syntax and semantics. # # Hints can also be found in the OpenLDAP Software FAQ # http://www.openldap.org/faq/index.cgi?file=2 # in particular: # http://www.openldap.org/faq/index.cgi?file=1075
# Note: most DB_CONFIG settings will take effect only upon rebuilding # the DB environment.
# one 0.25 GB cache set_cachesize 0 268435456 1
# Data Directory #set_data_dir db
# Transaction Log settings set_lg_regionmax 262144 set_lg_bsize 2097152 #set_lg_dir logs
# Note: special DB_CONFIG flags are no longer needed for "quick" # slapadd(8) or slapindex(8) access (see their -q option).
This was the next thing I was going to look.
On Oct 11, 2007, at 2:40 PM, Aaron Richton wrote:
If you're binding as the rootdn successfully, it's probably only that given bdb backend that's faulty. Can you track down the Sleepycat library you're using (perhaps using ldd), and find out what version it is?
At risk of my getting shot by Howard, are you caring for your Sleepycat log files properly? DB_CONFIGs for autoremove, or are there cron jobs for db_archive, etc.? What are your DB_CONFIGs, for that matter? (They'd have to be pretty far off to cause this behavior, but it's a quick look at least.)
You can tell REAL FAST if it's the backend database by archiving the database files and starting slapd and then restoring a backup via slapcat or ldapadd. If that works then the database files were corrupted in some way.
-- Puryear Information Technology, LLC Baton Rouge, LA * 225-706-8414 http://www.puryear-it.com
Author, "Best Practices for Managing Linux and UNIX Servers" http://www.puryear-it.com/pubs/linux-unix-best-practices
Identity Management, LDAP, and Linux Integration
Josh M. Hurd wrote:
I haven't found the version of BDB yet. Anyone know an easy way to do this?
My DB_CONFIG is basically the example that comes with OpenLDAP:
# $OpenLDAP: pkg/ldap/servers/slapd/DB_CONFIG,v 1.1.2.3 2006/08/17 17:36:19 kurt Exp $ # Example DB_CONFIG file for use with slapd(8) BDB/HDB databases. # # See Sleepycat Berkeley DB documentation # http://www.sleepycat.com/docs/ref/env/db_config.html # for detail description of DB_CONFIG syntax and semantics. # # Hints can also be found in the OpenLDAP Software FAQ # http://www.openldap.org/faq/index.cgi?file=2 # in particular: # http://www.openldap.org/faq/index.cgi?file=1075
# Note: most DB_CONFIG settings will take effect only upon rebuilding # the DB environment.
# one 0.25 GB cache set_cachesize 0 268435456 1
# Data Directory #set_data_dir db
# Transaction Log settings set_lg_regionmax 262144 set_lg_bsize 2097152 #set_lg_dir logs
# Note: special DB_CONFIG flags are no longer needed for "quick" # slapadd(8) or slapindex(8) access (see their -q option).
This was the next thing I was going to look.
On Oct 11, 2007, at 2:40 PM, Aaron Richton wrote:
If you're binding as the rootdn successfully, it's probably only that given bdb backend that's faulty. Can you track down the Sleepycat library you're using (perhaps using ldd), and find out what version it is?
At risk of my getting shot by Howard, are you caring for your Sleepycat log files properly? DB_CONFIGs for autoremove, or are there cron jobs for db_archive, etc.? What are your DB_CONFIGs, for that matter? (They'd have to be pretty far off to cause this behavior, but it's a quick look at least.)
Thanks, but this was a brand new DB created from an LDIF. Within hours this problem showed up. The DB is literally <2 days old.
I can certainly do this again just to be sure?
Josh
On Oct 12, 2007, at 1:40 PM, Dustin Puryear wrote:
You can tell REAL FAST if it's the backend database by archiving the database files and starting slapd and then restoring a backup via slapcat or ldapadd. If that works then the database files were corrupted in some way.
-- Puryear Information Technology, LLC Baton Rouge, LA * 225-706-8414 http://www.puryear-it.com
Author, "Best Practices for Managing Linux and UNIX Servers" http://www.puryear-it.com/pubs/linux-unix-best-practices
Identity Management, LDAP, and Linux Integration
Josh M. Hurd wrote:
I haven't found the version of BDB yet. Anyone know an easy way to do this?
My DB_CONFIG is basically the example that comes with OpenLDAP:
# $OpenLDAP: pkg/ldap/servers/slapd/DB_CONFIG,v 1.1.2.3 2006/08/17 17:36:19 kurt Exp $ # Example DB_CONFIG file for use with slapd(8) BDB/HDB databases. # # See Sleepycat Berkeley DB documentation # http://www.sleepycat.com/docs/ref/env/db_config.html # for detail description of DB_CONFIG syntax and semantics. # # Hints can also be found in the OpenLDAP Software FAQ # http://www.openldap.org/faq/index.cgi?file=2 # in particular: # http://www.openldap.org/faq/index.cgi?file=1075
# Note: most DB_CONFIG settings will take effect only upon rebuilding # the DB environment.
# one 0.25 GB cache set_cachesize 0 268435456 1
# Data Directory #set_data_dir db
# Transaction Log settings set_lg_regionmax 262144 set_lg_bsize 2097152 #set_lg_dir logs
# Note: special DB_CONFIG flags are no longer needed for "quick" # slapadd(8) or slapindex(8) access (see their -q option).
This was the next thing I was going to look.
On Oct 11, 2007, at 2:40 PM, Aaron Richton wrote:
If you're binding as the rootdn successfully, it's probably only that given bdb backend that's faulty. Can you track down the Sleepycat library you're using (perhaps using ldd), and find out what version it is?
At risk of my getting shot by Howard, are you caring for your Sleepycat log files properly? DB_CONFIGs for autoremove, or are there cron jobs for db_archive, etc.? What are your DB_CONFIGs, for that matter? (They'd have to be pretty far off to cause this behavior, but it's a quick look at least.)
On 10/12/07, Josh M. Hurd JoshH@revenuescience.com wrote:
I haven't found the version of BDB yet. Anyone know an easy way to do this?
Any of the db_ utilities with -V will tell you the version: msporleder$ ./db_stat -V Berkeley DB 4.5.20: (September 20, 2006) msporleder$ ./db_checkpoint -V Berkeley DB 4.5.20: (September 20, 2006)
Excellent! Thank you!
Clearly I am using an old version: Sleepycat Software: Berkeley DB 4.2.52: (December 11, 2004) I will upgrade that and see.
Josh
On Oct 13, 2007, at 8:28 AM, matthew sporleder wrote:
On 10/12/07, Josh M. Hurd JoshH@revenuescience.com wrote:
I haven't found the version of BDB yet. Anyone know an easy way to do this?
Any of the db_ utilities with -V will tell you the version: msporleder$ ./db_stat -V Berkeley DB 4.5.20: (September 20, 2006) msporleder$ ./db_checkpoint -V Berkeley DB 4.5.20: (September 20, 2006)
Josh M. Hurd skrev, on 13-10-2007 19:41:
Excellent! Thank you!
Clearly I am using an old version: Sleepycat Software: Berkeley DB 4.2.52: (December 11, 2004) I will upgrade that and see.
You do not mention your OS, distribution, details about your hardware configuration or anything else relevant to your OL environment save DB_CONFIG details.
I can't see much wrong with the latter for a small scale DB; if you are running a large scale DB then you probably haven't allocated enough cache memory. I don't know what the defaults are for max_locks, max_lockers or max_objects - Quanah or others could possibly help with those.
I've been running, and am presently running, OL 2.2.33-2.3.37 SleepyCat (now Oracle) BDB 4.2.52 in production on 4 servers at my 1500+ user, 50MB DB location 24x7x52 for years with no problems. First on Red Hat RHEL4, since Aug last on RHEL5. On grade A IBM hardware, both x86_32 and 64. Thousands of others are running 4.2.52 without problem on other hardware and other OSs.
The proviso is, that the libraries *have* to be patched with 4, possibly 5 depending on the patch versions, discrete patches.
Standard db4 on RHEL5 is 4.3.29. i.e. later than 4.2.52 with patches. But I've found out for myself by using standard Red Hat-supplied OL 2.3.27 that it doesn't work with OL 2.3.38. Therefore I use discrete db4 4.2.52 libraries for slapd and friends supplied by Buchan Milne.
If you do upgrade your db4 version, please make it 4.6+, enough has been written about that in this ML. Everything between patched 4.2.52 and 4.6 should be avoided. Moreover, if other things are making use of db4 4.2.52 on your unknown OS/distribution, then simply replacing it with an upgrade will most probably break everything else depending on 4.2.52.
In the dim and distant past (on RHEL3) I had source OL, BDB 4.2.52, Cyrus SASL 2.1 all separate in /usr/local and life was hell for various reasons. Try to avoid this kind of thing, keep installs as uniform as possible with a package managing system. Luckily, Buchan Milne supplies plug-in Red Hat rpms, which I use (actually I rebuild his srpms, since I have 2 architectures to build on).
Bottom line: I don't believe that swapping 4.2.52 for a db4 "upgrade" is going to help your problem: 4.2.52 works well for all than the most demanding uses. Could document this with Howard Chu's extensive doco, but I'll leave it up to you to search this out in the archives.
Best,
--Tonni
--On Sunday, October 14, 2007 6:25 PM +0200 Tony Earnshaw tonni@hetnet.nl wrote:
If you do upgrade your db4 version, please make it 4.6+, enough has been written about that in this ML. Everything between patched 4.2.52 and 4.6 should be avoided. Moreover, if other things are making use of db4 4.2.52 on your unknown OS/distribution, then simply replacing it with an upgrade will most probably break everything else depending on 4.2.52.
Bad advice, OpenLDAP 2.3 doesn't support BDB 4.6. Any of 4.2, 4.4, or 4.5 (with patches) should be okay. 4.3 should be avoided (and OpenLDAP 2.3 explicitly is made to not build against it).
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
--On Saturday, October 13, 2007 10:41 AM -0700 "Josh M. Hurd" JoshH@revenuescience.com wrote:
Excellent! Thank you!
Clearly I am using an old version: Sleepycat Software: Berkeley DB 4.2.52: (December 11, 2004) I will upgrade that and see.
Why? BDB 4.2.52 (+patches) is the most shown stable version of BDB. Of course, assuming your version has actually been patched.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
If there are multiple copies of Sleepycat on your system (e.g. /usr/lib and /usr/local/lib), then you have to be very careful that you're running the db_ tool that matches, which is why I don't particularly like this method.
As pointed out, 4.2.52 isn't bad if properly patched. You might want to check up on the provenance of your particular binaries and see if that's the case or not.
On Sat, 13 Oct 2007, Josh M. Hurd wrote:
Excellent! Thank you!
Clearly I am using an old version: Sleepycat Software: Berkeley DB 4.2.52: (December 11, 2004) I will upgrade that and see.
Josh
On Oct 13, 2007, at 8:28 AM, matthew sporleder wrote:
On 10/12/07, Josh M. Hurd JoshH@revenuescience.com wrote:
I haven't found the version of BDB yet. Anyone know an easy way to do this?
Any of the db_ utilities with -V will tell you the version: msporleder$ ./db_stat -V Berkeley DB 4.5.20: (September 20, 2006) msporleder$ ./db_checkpoint -V Berkeley DB 4.5.20: (September 20, 2006)
Josh M. Hurd JoshH@revenuescience.com wrote:
The problem is they stop returning data, slapd is still running otherwise seems ok.
I experienced a similar problem in the past, because slapd exhausted a system limit (data segment or file descriptors, I don't remember)
I agree with the other comments. Whenever I've seen this problem (i.e., you can connect, but then things just seem to hang), it's always been a problem with the backend. I'll bet if you run slapd in the foreground with debug output you'll see it just hang whenever it tries to do a search.
-- Puryear Information Technology, LLC Baton Rouge, LA * 225-706-8414 http://www.puryear-it.com
Author, "Best Practices for Managing Linux and UNIX Servers" http://www.puryear-it.com/pubs/linux-unix-best-practices
Identity Management, LDAP, and Linux Integration
Josh M. Hurd wrote:
I have been fighting with this issue for a couple months now and I really need a solution.
I have 2 openldap servers recently upgraded to 2.3.38 with a brand new rebuilt bdb from an LDIF dump. The 2 servers sit behind a load balancer (read-only) and provide basic authentication for about 300 linux servers. There's not much traffic on them but those who need access need access.
The problem is they stop returning data, slapd is still running otherwise seems ok. You can still bind to them using rootdn with no issues. I found an old thread describing a similar problem that suggested an upgrade which I did. I was using 2.2.13 now upgraded to 2.3.38
My level of knowledge of OpenLDAP is probably just above novice so I don't have a good base for trouble shooting.
This is causing HUGE disruption and needs to be fixed immediately so any and all help is much appreciated.
I turned on debug logging (-s 1) this morning so should have a bit of data to share with you if need be.
Thanks, Josh
On 10/11/07, Josh M. Hurd JoshH@revenuescience.com wrote:
I have been fighting with this issue for a couple months now and I really need a solution.
I have 2 openldap servers recently upgraded to 2.3.38 with a brand new rebuilt bdb from an LDIF dump. The 2 servers sit behind a load balancer (read-only) and provide basic authentication for about 300 linux servers. There's not much traffic on them but those who need access need access.
The problem is they stop returning data, slapd is still running otherwise seems ok. You can still bind to them using rootdn with no issues. I found an old thread describing a similar problem that suggested an upgrade which I did. I was using 2.2.13 now upgraded to 2.3.38
When you say "using rootdn with no issues" do you mean that data is returned if you use rootdn, or that BIND (the operation) seems to work (shows up in the logs) but no data comes back even with rootdn (vs other dn's allowed in your acl's)?
On Thursday 11 October 2007 20:45:21 Josh M. Hurd wrote:
I have been fighting with this issue for a couple months now and I really need a solution.
I have 2 openldap servers recently upgraded to 2.3.38 with a brand new rebuilt bdb from an LDIF dump. The 2 servers sit behind a load balancer (read-only) and provide basic authentication for about 300 linux servers.
Are these servers using nscd or not ? How many connections do they have to your LDAP servers ?
There's not much traffic on them but those who need access need access.
The problem is they stop returning data, slapd is still running otherwise seems ok.
Do you get any messages in the logs when this happens? How many connections do the servers have when this happens? I'm thinking you've run out of file descriptors (due to excessive connections, due to not using nscd and/or raising the file descriptor limit) which may be causing slapd to defer operations.
You can still bind to them using rootdn with no issues. I found an old thread describing a similar problem that suggested an upgrade which I did. I was using 2.2.13 now upgraded to 2.3.38
My level of knowledge of OpenLDAP is probably just above novice so I don't have a good base for trouble shooting.
This is causing HUGE disruption and needs to be fixed immediately so any and all help is much appreciated.
I turned on debug logging (-s 1) this morning so should have a bit of data to share with you if need be.
Right, but this only allows you to direct *what* syslog will do with the log entries generated by slapd, not what level of logging is generated by slapd (which you configure via the loglevel directive in slapd.conf).
Regards, Buchan
Thanks everyone for the help!
I finally did see the infamous Too Many files open error in the logs. I raised the ulimit -n to 2048 which was already set at 1024? I am NOT using nscd on the ldap server but will enable that to see what effects it has. Or were you asking if I use it on the clients?
I believe I am seeing a lot more connections than I originally thought. Seems that all my servers ask LDAP for user info a lot more than I had realized. nscd should help alleviate that tho I don't like using it on most of my servers for reason unrelated to LDAP.
Anyway, I finally have some good answers and a few action items to poke around with.
Thanks again! Josh
On Oct 15, 2007, at 11:57 AM, Buchan Milne wrote:
On Thursday 11 October 2007 20:45:21 Josh M. Hurd wrote:
I have been fighting with this issue for a couple months now and I really need a solution.
I have 2 openldap servers recently upgraded to 2.3.38 with a brand new rebuilt bdb from an LDIF dump. The 2 servers sit behind a load balancer (read-only) and provide basic authentication for about 300 linux servers.
Are these servers using nscd or not ? How many connections do they have to your LDAP servers ?
There's not much traffic on them but those who need access need access.
The problem is they stop returning data, slapd is still running otherwise seems ok.
Do you get any messages in the logs when this happens? How many connections do the servers have when this happens? I'm thinking you've run out of file descriptors (due to excessive connections, due to not using nscd and/or raising the file descriptor limit) which may be causing slapd to defer operations.
You can still bind to them using rootdn with no issues. I found an old thread describing a similar problem that suggested an upgrade which I did. I was using 2.2.13 now upgraded to 2.3.38
My level of knowledge of OpenLDAP is probably just above novice so I don't have a good base for trouble shooting.
This is causing HUGE disruption and needs to be fixed immediately so any and all help is much appreciated.
I turned on debug logging (-s 1) this morning so should have a bit of data to share with you if need be.
Right, but this only allows you to direct *what* syslog will do with the log entries generated by slapd, not what level of logging is generated by slapd (which you configure via the loglevel directive in slapd.conf).
Regards, Buchan
openldap-software@openldap.org