Problem unexpected failing slapd

List overview All Threads
Download

newer

older

objectclasses as part of schema...

Re: Slapd restarting slowly

Ruud Baart

27 Feb 2011 27 Feb '11

3:57 a.m.

Problem: For a customer we use LDAP for many years. Last year suddenly the slapd service just stopped without any traces in the logfiles. After a restart of slapd everything works fine again. But the problem was there: it was not an incident, now and then slapd just stops and always without any traces in the logfiles. Sometime three times a day, sometime a week without a failure. I can't find a pattern or any relation to any other service on the linux server.

Environment: - Several (debian squeeze) servers , several windows servers. We use bdb database backend. - There is one master LDAP server which provides syncprov and two replica's LDAP servers (syncrepl). The master server is most intens used (mainly samba as primary domain controller: a few hundred useraccounts, lot of groupaccounts, workstations, acl's, etc.), one of the replica's is not very busy but handles the mail for all users (lookup: amavis, postfix, courier-imap, mailaccount settings etc). The third replica is not busy at all, it is a remote location. - Total LDAP is 3700 dn's, slapcat produces a file of 7,3 Mb. - It is only the master LDAP with stops suddenly. I have never seen a failure of a replica LDAP.

Because I have no clear idea about the problem I have no idea which technical details are relevant: DB_CONFIG =========== set_cachesize 0 10485760 1 set_lk_max_objects 10000 set_lk_max_locks 10000 set_lk_max_lockers 10000 set_lg_dir /home/ldap-dbd The database is stored on a ext3 filesystem, kernel 2.6.32. The server has no problems, plenty of memory and a fast diskarray (SAS->SATA). Never technical problems with this server. And it worked without problems for a long period. Nothing has changed to the environment or the LDAP setup (except of course with the upgrade to debian squeeze but the problem was already there).

What we have tried: - upgrade from openldap 2..4.17 (debian lenny+backports) to openldap 2.4.23 (debian squeeze). I saw in the release notes that problems related to syncrepl were solved. Therefor we waited for version 2.4.23 te become available in debian. This upgrade made no difference. - reindex, rebuilt the directory. When I rebuilt the LDAP with a clean LDIF file on the master LDAP or an other machine with ldapadd there is not one error or warning.

The workaround for the moment: I have written a process monitor (perl daemon) which monitors the slapd daemon and if it suddenly stops, slapd is restarted. It is of course not a solution but the 300 user can work. If slapd stops without a restart within 1 minute a few hundred people can't work because samba stops working.

I would like to receive suggestions what we can do to find the problem. Because there is no pattern, nothing in the logfiles I don't know where to start.

-- Regards, Ruud Baart

Show replies by date

jekvb

27 Feb 27 Feb

2:54 p.m.

Sorry, I overlooked this info:

"The server has no problems, plenty of memory and a fast diskarray (SAS->SATA). Never technical problems with this server. And it worked without problems for a long period."

Which tells us that your system is on a metal box. I am afraid you 've got a hardware problem of some sort. I advise you to start checking all hardware components (or just replace the box).

Regards, Kuba

On Sun, 2011-02-27 at 12:57 +0100, Ruud Baart wrote:

...

Problem: For a customer we use LDAP for many years. Last year suddenly the slapd service just stopped without any traces in the logfiles. After a restart of slapd everything works fine again. But the problem was there: it was not an incident, now and then slapd just stops and always without any traces in the logfiles. Sometime three times a day, sometime a week without a failure. I can't find a pattern or any relation to any other service on the linux server.

Environment:

Several (debian squeeze) servers , several windows servers. We use bdb

database backend.

There is one master LDAP server which provides syncprov and two

replica's LDAP servers (syncrepl). The master server is most intens used (mainly samba as primary domain controller: a few hundred useraccounts, lot of groupaccounts, workstations, acl's, etc.), one of the replica's is not very busy but handles the mail for all users (lookup: amavis, postfix, courier-imap, mailaccount settings etc). The third replica is not busy at all, it is a remote location.

Total LDAP is 3700 dn's, slapcat produces a file of 7,3 Mb.

It is only the master LDAP with stops suddenly. I have never seen a

failure of a replica LDAP.

Because I have no clear idea about the problem I have no idea which technical details are relevant: DB_CONFIG =========== set_cachesize 0 10485760 1 set_lk_max_objects 10000 set_lk_max_locks 10000 set_lk_max_lockers 10000 set_lg_dir /home/ldap-dbd The database is stored on a ext3 filesystem, kernel 2.6.32. The server has no problems, plenty of memory and a fast diskarray (SAS->SATA). Never technical problems with this server. And it worked without problems for a long period. Nothing has changed to the environment or the LDAP setup (except of course with the upgrade to debian squeeze but the problem was already there).

What we have tried:

upgrade from openldap 2..4.17 (debian lenny+backports) to openldap

2.4.23 (debian squeeze). I saw in the release notes that problems related to syncrepl were solved. Therefor we waited for version 2.4.23 te become available in debian. This upgrade made no difference.

reindex, rebuilt the directory. When I rebuilt the LDAP with a clean

LDIF file on the master LDAP or an other machine with ldapadd there is not one error or warning.

The workaround for the moment: I have written a process monitor (perl daemon) which monitors the slapd daemon and if it suddenly stops, slapd is restarted. It is of course not a solution but the 300 user can work. If slapd stops without a restart within 1 minute a few hundred people can't work because samba stops working.

I would like to receive suggestions what we can do to find the problem. Because there is no pattern, nothing in the logfiles I don't know where to start.

Howard Chu

6 p.m.

Ruud Baart wrote:

...

Problem: For a customer we use LDAP for many years. Last year suddenly the slapd service just stopped without any traces in the logfiles. After a restart of slapd everything works fine again. But the problem was there: it was not an incident, now and then slapd just stops and always without any traces in the logfiles. Sometime three times a day, sometime a week without a failure. I can't find a pattern or any relation to any other service on the linux server.

Attach to the running slapd with gdb, type handle all nostop continue and let it run. If there's a crash you'll see what happened in gdb.

...

Environment:

Several (debian squeeze) servers , several windows servers. We use bdb

database backend.

There is one master LDAP server which provides syncprov and two

replica's LDAP servers (syncrepl). The master server is most intens used (mainly samba as primary domain controller: a few hundred useraccounts, lot of groupaccounts, workstations, acl's, etc.), one of the replica's is not very busy but handles the mail for all users (lookup: amavis, postfix, courier-imap, mailaccount settings etc). The third replica is not busy at all, it is a remote location.

Total LDAP is 3700 dn's, slapcat produces a file of 7,3 Mb.

It is only the master LDAP with stops suddenly. I have never seen a

failure of a replica LDAP.

Because I have no clear idea about the problem I have no idea which technical details are relevant: DB_CONFIG =========== set_cachesize 0 10485760 1 set_lk_max_objects 10000 set_lk_max_locks 10000 set_lk_max_lockers 10000 set_lg_dir /home/ldap-dbd The database is stored on a ext3 filesystem, kernel 2.6.32. The server has no problems, plenty of memory and a fast diskarray (SAS->SATA). Never technical problems with this server. And it worked without problems for a long period. Nothing has changed to the environment or the LDAP setup (except of course with the upgrade to debian squeeze but the problem was already there).

What we have tried:

upgrade from openldap 2..4.17 (debian lenny+backports) to openldap

2.4.23 (debian squeeze). I saw in the release notes that problems related to syncrepl were solved. Therefor we waited for version 2.4.23 te become available in debian. This upgrade made no difference.

reindex, rebuilt the directory. When I rebuilt the LDAP with a clean

LDIF file on the master LDAP or an other machine with ldapadd there is not one error or warning.

The workaround for the moment: I have written a process monitor (perl daemon) which monitors the slapd daemon and if it suddenly stops, slapd is restarted. It is of course not a solution but the 300 user can work. If slapd stops without a restart within 1 minute a few hundred people can't work because samba stops working.

I would like to receive suggestions what we can do to find the problem. Because there is no pattern, nothing in the logfiles I don't know where to start.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Ruud Baart

28 Feb 28 Feb

3:10 a.m.

Op 28-2-2011 3:00, Howard Chu schreef:

...

Ruud Baart wrote:

...
Problem: For a customer we use LDAP for many years. Last year suddenly the slapd service just stopped without any traces in the logfiles. After a restart of slapd everything works fine again. But the problem was there: it was not an incident, now and then slapd just stops and always without any traces in the logfiles. Sometime three times a day, sometime a week without a failure. I can't find a pattern or any relation to any other service on the linux server.

Attach to the running slapd with gdb, type handle all nostop continue and let it run. If there's a crash you'll see what happened in gdb.

I tried it but I don't get this working. I have no experience with gdb. I assume I do something wrong.

Normally slapd daemon runs like this on a test machine /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5

If I try this running attached to gdb: su openldap -s /bin/bash gdb --args /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5 GNU gdb (GDB) 7.0.1-debian .... This GDB was configured as "i486-linux-gnu". .. Reading symbols from /usr/sbin/slapd...(no debugging symbols found)...done. (gdb) handle all nostop Signal Stop Print Pass to program Description SIGHUP No Yes Yes Hangup SIGQUIT No Yes Yes Quit ... EXC_EMULATION No Yes Yes Emulation instruction EXC_SOFTWARE No Yes Yes Software generated exception EXC_BREAKPOINT No Yes Yes Breakpoint (gdb) (gdb) continue The program is not being run. (gdb) run Starting program: /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5 [Thread debugging using libthread_db enabled]

Program exited with code 01.

At this point I have no idea what to do.

-- Regards, Ruud Baart

harry.jede＠arcor.de

4:10 a.m.

Ruud Baart wrote:

...

Op 28-2-2011 3:00, Howard Chu schreef:

...
Ruud Baart wrote:

...
Problem: For a customer we use LDAP for many years. Last year suddenly the slapd service just stopped without any traces in the logfiles. After a restart of slapd everything works fine again. But the problem was there: it was not an incident, now and then slapd just stops and always without any traces in the logfiles. Sometime three times a day, sometime a week without a failure. I can't find a pattern or any relation to any other service on the linux server.

Attach to the running slapd with gdb, type handle all nostop continue and let it run. If there's a crash you'll see what happened in gdb.

I tried it but I don't get this working. I have no experience with gdb. I assume I do something wrong.

Normally slapd daemon runs like this on a test machine /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5

If I try this running attached to gdb: su openldap -s /bin/bash gdb --args /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5 GNU gdb (GDB) 7.0.1-debian .... This GDB was configured as "i486-linux-gnu". .. Reading symbols from /usr/sbin/slapd...(no debugging symbols found)...done.

May be it's agood idea to install the slapd package with debug infos.

aptitude search slapd i slapd - OpenLDAP server (slapd) p slapd-dbg - Debugging information for the OpenLDAP server

-- Harry Jede

Ruud Baart

4:28 a.m.

Thank you, I have installed the package:

gdb -q -x /root/gdb.init --args /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5

Reading symbols from /usr/sbin/slapd...Reading symbols from /usr/lib/debug/usr/sbin/slapd...done. (no debugging symbols found)...done. [Thread debugging using libthread_db enabled] [New Thread 0xb5f06b70 (LWP 1387)] [Thread 0xb5f06b70 (LWP 1387) exited]

Program exited normally.

Op 28-2-2011 13:10, harry.jede@arcor.de schreef:

...

Ruud Baart wrote:

...
Op 28-2-2011 3:00, Howard Chu schreef:

...
Ruud Baart wrote:

...
Problem: For a customer we use LDAP for many years. Last year suddenly the slapd service just stopped without any traces in the logfiles. After a restart of slapd everything works fine again. But the problem was there: it was not an incident, now and then slapd just stops and always without any traces in the logfiles. Sometime three times a day, sometime a week without a failure. I can't find a pattern or any relation to any other service on the linux server.

Attach to the running slapd with gdb, type handle all nostop continue and let it run. If there's a crash you'll see what happened in gdb.

I tried it but I don't get this working. I have no experience with gdb. I assume I do something wrong.

Normally slapd daemon runs like this on a test machine /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5

If I try this running attached to gdb: su openldap -s /bin/bash gdb --args /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5 GNU gdb (GDB) 7.0.1-debian .... This GDB was configured as "i486-linux-gnu". .. Reading symbols from /usr/sbin/slapd...(no debugging symbols found)...done.

May be it's agood idea to install the slapd package with debug infos.

aptitude search slapd i slapd - OpenLDAP server (slapd) p slapd-dbg - Debugging information for the OpenLDAP server

-- Regards, Ruud Baart

Ruud Baart

4:15 a.m.

Sorry, I think I found a way to start slapd with gdb. The main mistake I made was using uid openldap. It should be run as root. # gdb -q -x /root/gdb.init --args /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5

gdb.init: handle all nostop run quit

This way I can modify the start-stop script. With a restart there is no need for user intervention.

Now it runs and functions. I assume this is the way Howard Chu suggests. Let's wait and see what happens.

Op 28-2-2011 12:10, Ruud Baart schreef:

...

Op 28-2-2011 3:00, Howard Chu schreef:

...
Ruud Baart wrote:

...
Problem: For a customer we use LDAP for many years. Last year suddenly the slapd service just stopped without any traces in the logfiles. After a restart of slapd everything works fine again. But the problem was there: it was not an incident, now and then slapd just stops and always without any traces in the logfiles. Sometime three times a day, sometime a week without a failure. I can't find a pattern or any relation to any other service on the linux server.

Attach to the running slapd with gdb, type handle all nostop continue and let it run. If there's a crash you'll see what happened in gdb.

I tried it but I don't get this working. I have no experience with gdb. I assume I do something wrong.

Normally slapd daemon runs like this on a test machine /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5

If I try this running attached to gdb: su openldap -s /bin/bash gdb --args /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5 GNU gdb (GDB) 7.0.1-debian .... This GDB was configured as "i486-linux-gnu". .. Reading symbols from /usr/sbin/slapd...(no debugging symbols found)...done. (gdb) handle all nostop Signal Stop Print Pass to program Description SIGHUP No Yes Yes Hangup SIGQUIT No Yes Yes Quit ... EXC_EMULATION No Yes Yes Emulation instruction EXC_SOFTWARE No Yes Yes Software generated exception EXC_BREAKPOINT No Yes Yes Breakpoint (gdb) (gdb) continue The program is not being run. (gdb) run Starting program: /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5 [Thread debugging using libthread_db enabled]

Program exited with code 01.

At this point I have no idea what to do.

-- Met vriendelijke groet/Regards, Prompt Ruud BaartR.J.Baart@Prompt.NL Kerkstraat 173, 5261 CW Vught Tel: +31 73 6567041 www.prompt.nl - www.netwerkmonitoring.eu Voor vragen en ondersteuning: support@prompt.nl

Howard Chu

4:26 a.m.

Ruud Baart wrote:

...

Sorry, I think I found a way to start slapd with gdb. The main mistake I made was using uid openldap. It should be run as root. # gdb -q -x /root/gdb.init --args /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5

gdb.init: handle all nostop run quit

This way I can modify the start-stop script. With a restart there is no need for user intervention.

Now it runs and functions. I assume this is the way Howard Chu suggests. Let's wait and see what happens.

No. I did not say to start slapd using gdb. I said to attach gdb to the running slapd, which means slapd should already be started, using whatever method you normally use to start it.

What you've done will accomplish nothing.

...

Op 28-2-2011 12:10, Ruud Baart schreef:

...
Op 28-2-2011 3:00, Howard Chu schreef:

...
Ruud Baart wrote:

...
Problem: For a customer we use LDAP for many years. Last year suddenly the slapd service just stopped without any traces in the logfiles. After a restart of slapd everything works fine again. But the problem was there: it was not an incident, now and then slapd just stops and always without any traces in the logfiles. Sometime three times a day, sometime a week without a failure. I can't find a pattern or any relation to any other service on the linux server.

Attach to the running slapd with gdb, type handle all nostop continue and let it run. If there's a crash you'll see what happened in gdb.

I tried it but I don't get this working. I have no experience with gdb. I assume I do something wrong.

Normally slapd daemon runs like this on a test machine /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5

If I try this running attached to gdb: su openldap -s /bin/bash gdb --args /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5 GNU gdb (GDB) 7.0.1-debian .... This GDB was configured as "i486-linux-gnu". .. Reading symbols from /usr/sbin/slapd...(no debugging symbols found)...done. (gdb) handle all nostop Signal Stop Print Pass to program Description SIGHUP No Yes Yes Hangup SIGQUIT No Yes Yes Quit ... EXC_EMULATION No Yes Yes Emulation instruction EXC_SOFTWARE No Yes Yes Software generated exception EXC_BREAKPOINT No Yes Yes Breakpoint (gdb) (gdb) continue The program is not being run. (gdb) run Starting program: /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5 [Thread debugging using libthread_db enabled]

Program exited with code 01.

At this point I have no idea what to do.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Buchan Milne

4:28 a.m.

----- "Ruud Baart" r.j.baart@prompt.nl wrote:

...

Sorry, I think I found a way to start slapd with gdb. The main mistake I made was using uid openldap. It should be run as root. # gdb -q -x /root/gdb.init --args /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5

gdb.init: handle all nostop run quit

This way I can modify the start-stop script. With a restart there is no need for user intervention.

Now it runs and functions. I assume this is the way Howard Chu suggests. Let's wait and see what happens.

[...]

...

...
...
Attach to the running slapd with gdb, type handle all nostop continue and let it run. If there's a crash you'll see what happened in

gdb.

"Attach to" normally means, with an existing process you want to debug, provide the relevant option and the pid number of the process, so the debugging command starts debugging the existing process.

For example, searching for 'attach' in the gdb man page indicates this ...

Regards, Buchan

Ruud Baart

5:15 a.m.

Thank you all for the fast answers.

I now attached gdb to slapd and get some results:

(gdb) handle all nostop (gdb) continue Continuing.

Program received signal SIGPIPE, Broken pipe.

Program received signal SIGPIPE, Broken pipe. [New Thread 0xb01f6b70 (LWP 1548)]

Program received signal SIGPIPE, Broken pipe. .. more of the same .. Program received signal SIGPIPE, Broken pipe.

Program received signal SIGPIPE, Broken pipe.

---Type <return> to continue, or q <return> to quit--- Program received signal SIGPIPE, Broken pipe.

Program received signal SIGTERM, Terminated. [New Thread 0xaf4f3b70 (LWP 1968)] [New Thread 0xaf0f2b70 (LWP 1969)] [New Thread 0xaecf1b70 (LWP 1970)] [New Thread 0xae8f0b70 (LWP 1971)] [Thread 0xaf4f3b70 (LWP 1968) exited] [Thread 0xb1bfcb70 (LWP 1474) exited] [Thread 0xb5268b70 (LWP 1462) exited] [Thread 0xaf0f2b70 (LWP 1969) exited] [Thread 0xb4565b70 (LWP 1463) exited] [Thread 0xb01f6b70 (LWP 1548) exited] [Thread 0xb5669b70 (LWP 1461) exited] [Thread 0xb17fbb70 (LWP 1475) exited] [Thread 0xae8f0b70 (LWP 1971) exited] [Thread 0xb28ffb70 (LWP 1464) exited] [Thread 0xaecf1b70 (LWP 1970) exited]

Program exited normally.

-- Regards, Ruud Baart

Howard Chu

5:30 a.m.

Ruud Baart wrote:

...

Thank you all for the fast answers.

I now attached gdb to slapd and get some results:

(gdb) handle all nostop (gdb) continue Continuing.

Program received signal SIGPIPE, Broken pipe.

Program received signal SIGPIPE, Broken pipe. [New Thread 0xb01f6b70 (LWP 1548)]

Program received signal SIGPIPE, Broken pipe. .. more of the same .. Program received signal SIGPIPE, Broken pipe.

Program received signal SIGPIPE, Broken pipe.

---Type<return> to continue, or q<return> to quit--- Program received signal SIGPIPE, Broken pipe.

Program received signal SIGTERM, Terminated.

This is not a crash or any error in slapd; some external command was used to kill the slapd process.

...

[New Thread 0xaf4f3b70 (LWP 1968)] [New Thread 0xaf0f2b70 (LWP 1969)] [New Thread 0xaecf1b70 (LWP 1970)] [New Thread 0xae8f0b70 (LWP 1971)] [Thread 0xaf4f3b70 (LWP 1968) exited] [Thread 0xb1bfcb70 (LWP 1474) exited] [Thread 0xb5268b70 (LWP 1462) exited] [Thread 0xaf0f2b70 (LWP 1969) exited] [Thread 0xb4565b70 (LWP 1463) exited] [Thread 0xb01f6b70 (LWP 1548) exited] [Thread 0xb5669b70 (LWP 1461) exited] [Thread 0xb17fbb70 (LWP 1475) exited] [Thread 0xae8f0b70 (LWP 1971) exited] [Thread 0xb28ffb70 (LWP 1464) exited] [Thread 0xaecf1b70 (LWP 1970) exited]

Program exited normally.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Ruud Baart

3 Mar 3 Mar

2:43 a.m.

I have followed slapd for several days now. The last few days slapd stopped a couple of time. Last day it worked for more than 24 hours without a problem.

attaching gdb to slapd: Attaching to process 3845 Reading symbols from /usr/sbin/slapd...Reading symbols from /usr/lib/debug/usr/sbin/slapd...done. (no debugging symbols found)...done. Reading symbols from /usr/lib/libldap_r-2.4.so.2...Reading symbols from /usr/lib/debug/usr/lib/libldap_r-2.4.so.2.5.6...done. (no debugging symbols found)...done. Loaded symbols for /usr/lib/libldap_r-2.4.so.2 Reading symbols from /usr/lib/liblber-2.4.so.2...Reading symbols from /usr/lib/debug/usr/lib/liblber-2.4.so.2.5.6...done. (no debugging symbols found)...done. Loaded symbols for /usr/lib/liblber-2.4.so.2 Reading symbols from /usr/lib/libdb-4.8.so...Reading symbols from /usr/lib/debug/usr/lib/libdb-4.8.so.debug...done. (no debugging symbols found)...done. Loaded symbols for /usr/lib/libdb-4.8.so Reading symbols from /usr/lib/libodbc.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libodbc.so.1 Reading symbols from /usr/lib/libslp.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libslp.so.1 Reading symbols from /usr/lib/libsasl2.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libsasl2.so.2 Reading symbols from /usr/lib/libgnutls.so.26...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libgnutls.so.26 Reading symbols from /lib/i686/cmov/libcrypt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/i686/cmov/libcrypt.so.1 Reading symbols from /lib/i686/cmov/libresolv.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/i686/cmov/libresolv.so.2 Reading symbols from /usr/lib/libltdl.so.7...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libltdl.so.7 Reading symbols from /lib/libwrap.so.0...(no debugging symbols found)...done. Loaded symbols for /lib/libwrap.so.0 Reading symbols from /lib/i686/cmov/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] [New Thread 0xb0cffb70 (LWP 7018)] [New Thread 0xb1bffb70 (LWP 4253)] [New Thread 0xb3cdcb70 (LWP 3850)] [New Thread 0xb40ddb70 (LWP 3849)] [New Thread 0xb44deb70 (LWP 3848)] [New Thread 0xb51e1b70 (LWP 3847)] [New Thread 0xb55e2b70 (LWP 3846)] Loaded symbols for /lib/i686/cmov/libpthread.so.0 Reading symbols from /lib/i686/cmov/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/i686/cmov/libc.so.6 Reading symbols from /lib/i686/cmov/libnsl.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/i686/cmov/libnsl.so.1 Reading symbols from /lib/i686/cmov/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/i686/cmov/libdl.so.2 Reading symbols from /usr/lib/libtasn1.so.3...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libtasn1.so.3 Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libgcrypt.so.11...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libgcrypt.so.11 Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /usr/lib/libgpg-error.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libgpg-error.so.0 Reading symbols from /lib/i686/cmov/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/i686/cmov/libnss_files.so.2 Reading symbols from /lib/i686/cmov/libnss_compat.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/i686/cmov/libnss_compat.so.2 Reading symbols from /lib/i686/cmov/libnss_nis.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/i686/cmov/libnss_nis.so.2 Reading symbols from /lib/libnss_ldap.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_ldap.so.2 Reading symbols from /usr/lib/libkrb5.so.3...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libkrb5.so.3 Reading symbols from /lib/libcom_err.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libcom_err.so.2 Reading symbols from /usr/lib/libgssapi_krb5.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libgssapi_krb5.so.2 Reading symbols from /usr/lib/libk5crypto.so.3...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libk5crypto.so.3 Reading symbols from /usr/lib/libkrb5support.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libkrb5support.so.0 Reading symbols from /lib/libkeyutils.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libkeyutils.so.1 Reading symbols from /usr/lib/sasl2/libsasldb.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib/sasl2/libsasldb.so.2 Reading symbols from /usr/lib/sasl2/libcrammd5.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib/sasl2/libcrammd5.so.2 Reading symbols from /usr/lib/sasl2/libanonymous.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib/sasl2/libanonymous.so.2 Reading symbols from /usr/lib/sasl2/libdigestmd5.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib/sasl2/libdigestmd5.so.2 Reading symbols from /usr/lib/i686/cmov/libcrypto.so.0.9.8...(no debugging symbols found)...done. Loaded symbols for /usr/lib/i686/cmov/libcrypto.so.0.9.8 Reading symbols from /usr/lib/sasl2/libplain.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib/sasl2/libplain.so.2 Reading symbols from /usr/lib/sasl2/liblogin.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib/sasl2/liblogin.so.2 Reading symbols from /usr/lib/sasl2/libntlm.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib/sasl2/libntlm.so.2 Reading symbols from /usr/lib/ldap/back_bdb-2.4.so.2...Reading symbols from /usr/lib/debug/usr/lib/ldap/back_bdb-2.4.so.2.5.6...done. (no debugging symbols found)...done. Loaded symbols for /usr/lib/ldap/back_bdb-2.4.so.2 Reading symbols from /usr/lib/ldap/syncprov-2.4.so.2...Reading symbols from /usr/lib/debug/usr/lib/ldap/syncprov-2.4.so.2.5.6...done. (no debugging symbols found)...done. Loaded symbols for /usr/lib/ldap/syncprov-2.4.so.2 0xb782b424 in __kernel_vsyscall ()

Program received signal SIGPIPE, Broken pipe.

.. few hundred time same messages ...

Program received signal SIGABRT, Aborted. [Thread 0xb55e2b70 (LWP 3846) exited] [Thread 0xafbffb70 (LWP 11193) exited] [Thread 0xb51e1b70 (LWP 3847) exited] [Thread 0xb44deb70 (LWP 3848) exited] [Thread 0xb40ddb70 (LWP 3849) exited] [Thread 0xb3cdcb70 (LWP 3850) exited] [Thread 0xb1bffb70 (LWP 4253) exited] [Thread 0xb0cffb70 (LWP 7018) exited]

Last messages in log-file Mar 3 05:17:38 ux-254 slapd[3845]: connection_read(45): no connection! Mar 3 05:17:38 ux-254 slapd[3845]: connection_read(45): no connection! Mar 3 05:17:54 ux-254 slapd[3845]: send_search_entry: conn 52805 ber write failed.

This is the first time I saw a SIGABRT. All other times nothing but SIGPIPE or sometimes a SIGTERM. As you can see slapd stops late in the night, everybody sleeping, hardly users working.

As I wrote earlier, I use a perl daemon wich checks the process: pid combined with processtable. If the process is gone in the processtable or in defunc or stop state it will restart slapd. This happens within 20 seconds. This way we manage to minimize the impact of the problems with slapd. Perhaps the SIGTERM is the result of the processmonitor. If that is the case it must be the result of a defunct or stop condition in the process table. But we can't work without this tools to restart slapd as quickly as possible.

Op 28-2-2011 14:30, Howard Chu schreef:

...

Ruud Baart wrote:

...
Thank you all for the fast answers.

I now attached gdb to slapd and get some results:

(gdb) handle all nostop (gdb) continue Continuing.

Program received signal SIGPIPE, Broken pipe.

Program received signal SIGPIPE, Broken pipe. [New Thread 0xb01f6b70 (LWP 1548)]

Program received signal SIGPIPE, Broken pipe. .. more of the same .. Program received signal SIGPIPE, Broken pipe.

Program received signal SIGPIPE, Broken pipe.

---Type<return> to continue, or q<return> to quit--- Program received signal SIGPIPE, Broken pipe.

Program received signal SIGTERM, Terminated.

This is not a crash or any error in slapd; some external command was used to kill the slapd process.

...
[New Thread 0xaf4f3b70 (LWP 1968)] [New Thread 0xaf0f2b70 (LWP 1969)] [New Thread 0xaecf1b70 (LWP 1970)] [New Thread 0xae8f0b70 (LWP 1971)] [Thread 0xaf4f3b70 (LWP 1968) exited] [Thread 0xb1bfcb70 (LWP 1474) exited] [Thread 0xb5268b70 (LWP 1462) exited] [Thread 0xaf0f2b70 (LWP 1969) exited] [Thread 0xb4565b70 (LWP 1463) exited] [Thread 0xb01f6b70 (LWP 1548) exited] [Thread 0xb5669b70 (LWP 1461) exited] [Thread 0xb17fbb70 (LWP 1475) exited] [Thread 0xae8f0b70 (LWP 1971) exited] [Thread 0xb28ffb70 (LWP 1464) exited] [Thread 0xaecf1b70 (LWP 1970) exited]

Program exited normally.

-- Regards, Ruud Baart

harry.jede＠arcor.de

28 Feb 28 Feb

4:18 a.m.

Ruud Baart wrote:

...

Op 28-2-2011 3:00, Howard Chu schreef:

...
Ruud Baart wrote:

...
Problem: For a customer we use LDAP for many years. Last year suddenly the slapd service just stopped without any traces in the logfiles. After a restart of slapd everything works fine again. But the problem was there: it was not an incident, now and then slapd just stops and always without any traces in the logfiles. Sometime three times a day, sometime a week without a failure. I can't find a pattern or any relation to any other service on the linux server.

Attach to the running slapd with gdb, type handle all nostop continue and let it run. If there's a crash you'll see what happened in gdb.

I tried it but I don't get this working. I have no experience with gdb. I assume I do something wrong.

Normally slapd daemon runs like this on a test machine /usr/sbin/slapd -h ldap:/// ldapi:/// -g openldap -u openldap -f /etc/ldap/slapd.conf -l local5

If I try this running attached to gdb: su openldap -s /bin/bash

I am pretty sure, that you should not switch to the openldap user. Run it as root.

-- Harry Jede

5234

Age (days ago)

5238

Last active (days ago)

openldap-technical@openldap.org

12 comments

5 participants

tags (0)

participants (5)

Buchan Milne
harry.jede＠arcor.de
Howard Chu
jekvb
Ruud Baart