Hello.
slapd keep dying more and more frequently recently, from dying once a week in last month to current dying 3 to 5 times a day. "/etc/init.d/slapd start" instantly recover it, for next a few hours. Exactly the same web application that uses the ldap database had been running as-is for about one year without this problem. Server machine never changed/replaced/touched.
What we tried are (in that order):
1. Run a server monitor to make sure the server load is not high (below 0.5) when slapd dies. 2. Upgrade to 2.4.11-1+lenny2 (on Debian) 3. slapcat the mostly used database (hdb) and slapadd them back in. 4. do the same for the other database (bdb); 5. track the log message at log level 256 (connection) and finding no clue. For example, one time, the last word is:
Oct 25 16:06:59 www slapd[11969]: conn=26289 fd=29 ACCEPT from IP=**.**.**.**:56539 (IP=0.0.0.0:389) Oct 25 16:06:59 www slapd[11969]: conn=26289 op=0 BIND dn="cn=admin,dc=*******" method=128 Oct 25 16:06:59 www slapd[11969]: conn=26289 op=0 BIND dn="cn=admin,dc=*******" mech=SIMPLE ssf=0 Oct 25 16:06:59 www slapd[11969]: conn=26289 op=0 RESULT tag=97 err=0 text= Oct 25 16:06:59 www slapd[11969]: conn=26288 op=8 UNBIND Oct 25 16:06:59 www slapd[11969]: conn=26288 fd=41 closed Oct 25 16:06:59 www slapd[11969]: conn=26289 op=1 UNBIND Oct 25 16:06:59 www slapd[11969]: conn=26289 fd=29 closed
Another time it is:
Oct 25 16:27:09 www slapd[25691]: conn=2750 fd=51 ACCEPT from IP=**.**.**.**:54846 (IP=0.0.0.0:389) Oct 25 16:27:09 www slapd[25691]: conn=2750 op=1 SRCH base="ou=contacts,ou=china,dc=*******" scope=2 deref=0 filter="(uidNumber=7762)" Oct 25 16:27:10 www slapd[25691]: conn=2750 op=1 SRCH attr=o mail telephonenumber contactperson c st l street postalcode postofficebox facsimiletelephonenumber labeleduri businesscategory description pnglogo changetime lastrecapdate objectclass category
6. Track the log message at log level 4 (heavy trace debugging) and finding no clue. For example, one time, the last word is:
Oct 25 22:23:48 www slapd[723]: connection_get(50) Oct 25 22:23:48 www slapd[723]: SRCH "ou=contacts,ou=china,dc=*******" 2 0 Oct 25 22:23:48 www slapd[723]: 0 0 0 Oct 25 22:23:48 www slapd[723]: filter: (uidNumber=2) Oct 25 22:23:48 www slapd[723]: attrs: Oct 25 22:23:49 www slapd[723]: Oct 25 22:23:49 www slapd[723]: connection_get(56) Oct 25 22:23:49 www slapd[723]: SRCH "ou=contacts,ou=china,dc=*******" 2 0 Oct 25 22:23:49 www slapd[723]: 0 0 0 Oct 25 22:23:49 www slapd[723]: filter: (uidNumber=2)
Help, hints and suggestion of specific RTFM highly appreciated. Offering *paid* help to remote login to solve this problem is highly appreciated as well (please send me and my colleague on the 'cc' an email about the quotes). The problem exhausted us all.
Thanks in advance!
Zhang Weiwu
--On Monday, October 25, 2010 11:01 PM +0800 Zhang Weiwu zhangweiwu@realss.com wrote:
Hello.
- Upgrade to 2.4.11-1+lenny2 (on Debian)
Help, hints and suggestion of specific RTFM highly appreciated. Offering *paid* help to remote login to solve this problem is highly appreciated as well (please send me and my colleague on the 'cc' an email about the quotes). The problem exhausted us all.
Don't use 2.4.11, don't use the build from Debian, and use a new BDB with all patches.
http://www.openldap.org/faq/data/cache/1456.html
Also note Debian uses GnuTLS to link OpenLDAP against, which has its own issues.
Basically, build a current release (2.4.23), use a current BDB (BDB 4.8), make sure any and all patches for the BDB release are applied, and build against OpenSSL and not GnuTLS.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
openldap-technical@openldap.org