slapd crashing "randomly?"

List overview All Threads
Download

newer

older

deferring operation: pending...

acl problem

daniel＠ncsu.edu

6 Feb 2007 6 Feb '07

8:29 a.m.

Hi folk,

I want to start this message by saying, what I'm about to describe is completely vague and I don't expect to get a solution response. ;) Basically, I'm out of ideas and am looking for some suggestions as to how to debug the issue I'm running into.

Starting about half a year ago, slapd started "just dieing" out of the blue. Not a think in the logs shows up to indicate what might have caused it. The last query that I see in the logs before a crash always seems to be nothing special. I don't even see a core dump being generated yet, but then that may just be because I don't have the proper setup to get a core dump at this time. We were running the last 2.2 and upgraded to the latest release of 2.3 to make sure it wasn't an "old version" issue. Unfortunately, slapd still dies a fair amount on us. It appears to be fairly unpredictable. I've seen it crash within 1 minute of starting up slapd (then a subsequent startup 'takes' just fine). I've seen it crash when there were a number of network issues going on. I've seen it crash out of the blue when nothing appeared to be going on. I don't really have the drive space to turn on max debug logging 24/7 until the problem occurs.

We're thinking about setting up something to watch all of the network traffic going to one of the boxes until it dies. (assuming we can find something with the resources to do that)

That all said... since I have nothing solid to present, do you all have any suggestions of what would be the best way to track down what's going on? I'm literally out of ideas unless my berkeley db config is somehow causing the problem or something like that.

I apologize for the vagueness. =/ Any ideas/suggestions?

Daniel

Show replies by date

matthew sporleder

6 Feb 6 Feb

10:35 a.m.

On 2/6/07, daniel@ncsu.edu daniel@ncsu.edu wrote:

...

Hi folk,

I want to start this message by saying, what I'm about to describe is completely vague and I don't expect to get a solution response. ;) Basically, I'm out of ideas and am looking for some suggestions as to how to debug the issue I'm running into.

Starting about half a year ago, slapd started "just dieing" out of the blue. Not a think in the logs shows up to indicate what might have caused it. The last query that I see in the logs before a crash always seems to be nothing special. I don't even see a core dump being generated yet, but then that may just be because I don't have the proper setup to get a core dump at this time. We were running the last 2.2 and upgraded to the latest release of 2.3 to make sure it wasn't an "old version" issue. Unfortunately, slapd still dies a fair amount on us. It appears to be fairly unpredictable. I've seen it crash within 1 minute of starting up slapd (then a subsequent startup 'takes' just fine). I've seen it crash when there were a number of network issues going on. I've seen it crash out of the blue when nothing appeared to be going on. I don't really have the drive space to turn on max debug logging 24/7 until the problem occurs.

We're thinking about setting up something to watch all of the network traffic going to one of the boxes until it dies. (assuming we can find something with the resources to do that)

That all said... since I have nothing solid to present, do you all have any suggestions of what would be the best way to track down what's going on? I'm literally out of ideas unless my berkeley db config is somehow causing the problem or something like that.

I apologize for the vagueness. =/ Any ideas/suggestions?

After the crash, is your bdb environment clean, or is it needing a db_recover? Depending on your OS, you could watch the pid all the time and trap the last signals received, last files accessed, etc, and that wouldn't take tons of resources.

You could try turning on max debugging and simply rotate a lot more often. (every n minutes or even seconds) This way you could definitely keep the -last- transactions and just not worry about the old ones.

Daniel Henninger

11:31 a.m.

On Feb 6, 2007, at 1:35 PM, matthew sporleder wrote:

...

On 2/6/07, daniel@ncsu.edu daniel@ncsu.edu wrote:

...
Hi folk,

I want to start this message by saying, what I'm about to describe is completely vague and I don't expect to get a solution response. ;) Basically, I'm out of ideas and am looking for some suggestions as to how to debug the issue I'm running into.

Starting about half a year ago, slapd started "just dieing" out of the blue. Not a think in the logs shows up to indicate what might have caused it. The last query that I see in the logs before a crash always seems to be nothing special. I don't even see a core dump being generated yet, but then that may just be because I don't have the proper setup to get a core dump at this time. We were running the last 2.2 and upgraded to the latest release of 2.3 to make sure it wasn't an "old version" issue. Unfortunately, slapd still dies a fair amount on us. It appears to be fairly unpredictable. I've seen it crash within 1 minute of starting up slapd (then a subsequent startup 'takes' just fine). I've seen it crash when there were a number of network issues going on. I've seen it crash out of the blue when nothing appeared to be going on. I don't really have the drive space to turn on max debug logging 24/7 until the problem occurs.

We're thinking about setting up something to watch all of the network traffic going to one of the boxes until it dies. (assuming we can find something with the resources to do that)

That all said... since I have nothing solid to present, do you all have any suggestions of what would be the best way to track down what's going on? I'm literally out of ideas unless my berkeley db config is somehow causing the problem or something like that.

I apologize for the vagueness. =/ Any ideas/suggestions?

After the crash, is your bdb environment clean, or is it needing a db_recover? Depending on your OS, you could watch the pid all the time and trap the last signals received, last files accessed, etc, and that wouldn't take tons of resources.

The BDB environment is indeed unclean after the crash. Though, as of 2.3 it appears to auto-db_recover most of the time if someone else starts up slapd before doing the recover.

...

You could try turning on max debugging and simply rotate a lot more often. (every n minutes or even seconds) This way you could definitely keep the -last- transactions and just not worry about the old ones.

Unfortnately we also use those logs for analysis. =/ But I might go that route depending on how things go!

Thanks!

Daniel

Quanah Gibson-Mount

12:34 p.m.

--On Tuesday, February 06, 2007 1:35 PM -0500 matthew sporleder msporleder@gmail.com wrote:

...

On 2/6/07, daniel@ncsu.edu daniel@ncsu.edu wrote:

...
Hi folk,

I want to start this message by saying, what I'm about to describe is completely vague and I don't expect to get a solution response. ;) Basically, I'm out of ideas and am looking for some suggestions as to how to debug the issue I'm running into.

Starting about half a year ago, slapd started "just dieing" out of the blue. Not a think in the logs shows up to indicate what might have caused it. The last query that I see in the logs before a crash always seems to be nothing special. I don't even see a core dump being generated yet, but then that may just be because I don't have the proper setup to get a core dump at this time. We were running the last 2.2 and upgraded to the latest release of 2.3 to make sure it wasn't an "old version" issue. Unfortunately, slapd still dies a fair amount on us. It appears to be fairly unpredictable. I've seen it crash within 1 minute of starting up slapd (then a subsequent startup 'takes' just fine). I've seen it crash when there were a number of network issues going on. I've seen it crash out of the blue when nothing appeared to be going on. I don't really have the drive space to turn on max debug logging 24/7 until the problem occurs.

We're thinking about setting up something to watch all of the network traffic going to one of the boxes until it dies. (assuming we can find something with the resources to do that)

That all said... since I have nothing solid to present, do you all have any suggestions of what would be the best way to track down what's going on? I'm literally out of ideas unless my berkeley db config is somehow causing the problem or something like that.

I apologize for the vagueness. =/ Any ideas/suggestions?

After the crash, is your bdb environment clean, or is it needing a db_recover? Depending on your OS, you could watch the pid all the time and trap the last signals received, last files accessed, etc, and that wouldn't take tons of resources.

You could try turning on max debugging and simply rotate a lot more often. (every n minutes or even seconds) This way you could definitely keep the -last- transactions and just not worry about the old ones.

Also, what database backend are you using? Why not build slapd with debugging symbols so you can get a core?

What version of 2.3 are you running at the moment? You say you had upgraded to the latest release at some point, but not what release that was. Up until around 2.3.28, there were issues in the connection code that caused random crashes on my servers. 2.3.33 would be your best bet to eliminate that as an issue if you aren't there yet.

--Quanah

-- Quanah Gibson-Mount Principal Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

Daniel Henninger

12 Feb 12 Feb

8:36 a.m.

On Feb 6, 2007, at 3:34 PM, Quanah Gibson-Mount wrote:

...

--On Tuesday, February 06, 2007 1:35 PM -0500 matthew sporleder msporleder@gmail.com wrote:

...
On 2/6/07, daniel@ncsu.edu daniel@ncsu.edu wrote:

...
Hi folk,

I want to start this message by saying, what I'm about to describe is completely vague and I don't expect to get a solution response. ;) Basically, I'm out of ideas and am looking for some suggestions as to how to debug the issue I'm running into.

Starting about half a year ago, slapd started "just dieing" out of the blue. Not a think in the logs shows up to indicate what might have caused it. The last query that I see in the logs before a crash always seems to be nothing special. I don't even see a core dump being generated yet, but then that may just be because I don't have the proper setup to get a core dump at this time. We were running the last 2.2 and upgraded to the latest release of 2.3 to make sure it wasn't an "old version" issue. Unfortunately, slapd still dies a fair amount on us. It appears to be fairly unpredictable. I've seen it crash within 1 minute of starting up slapd (then a subsequent startup 'takes' just fine). I've seen it crash when there were a number of network issues going on. I've seen it crash out of the blue when nothing appeared to be going on. I don't really have the drive space to turn on max debug logging 24/7 until the problem occurs.

We're thinking about setting up something to watch all of the network traffic going to one of the boxes until it dies. (assuming we can find something with the resources to do that)

That all said... since I have nothing solid to present, do you all have any suggestions of what would be the best way to track down what's going on? I'm literally out of ideas unless my berkeley db config is somehow causing the problem or something like that.

I apologize for the vagueness. =/ Any ideas/suggestions?

After the crash, is your bdb environment clean, or is it needing a db_recover? Depending on your OS, you could watch the pid all the time and trap the last signals received, last files accessed, etc, and that wouldn't take tons of resources.

You could try turning on max debugging and simply rotate a lot more often. (every n minutes or even seconds) This way you could definitely keep the -last- transactions and just not worry about the old ones.

Also, what database backend are you using? Why not build slapd with debugging symbols so you can get a core?

BDB and I am planning on doing so ;D

...

What version of 2.3 are you running at the moment? You say you had upgraded to the latest release at some point, but not what release that was. Up until around 2.3.28, there were issues in the connection code that caused random crashes on my servers. 2.3.33 would be your best bet to eliminate that as an issue if you aren't there yet.

2.3.32 is what we're running right now. I've been sticking with the version that's labelled as "stable". Do y'all recommend going with the release instead of the "stable"?

I've at least been having this issue since 2.2.whatever, so it's been going on for quite some time version wise. Timewise, I still think something may have changed in my world to cause all of this, but just can't track it down.

Anyway, I'm working on setting up some things with which I can track it.

Thanks!

Daniel

...

--Quanah

-- Quanah Gibson-Mount Principal Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

Daniel Henninger

11 Apr 11 Apr

10:25 a.m.

Been a while, but I finally caught a core dump. Of course, I'm not entirely sure why there's so few useful symbols showing since I compiled it with debugging symbols and didn't strip it. =/ Anyway, the information I got from it is interesting:

(gdb) bt #0 0x000b4694 in ?? () #1 0x000e175c in avl_delete () #2 0x000b4c48 in bdb_idl_cache_put () #3 0x000b5930 in bdb_idl_fetch_key () #4 0x000b796c in bdb_key_read () #5 0x000b30b0 in bdb_filter_candidates () #6 0x000b3a28 in ?? () #7 0x000b3a28 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb)

Is it possible I just have a busted version of berkeley db?! What version are you all using? (I guess it's Oracle DB now...) We are using version 4.2.52. Built with --enable-compat185.

Daniel

On Feb 12, 2007, at 11:36 AM, Daniel Henninger wrote:

...

On Feb 6, 2007, at 3:34 PM, Quanah Gibson-Mount wrote:

...
--On Tuesday, February 06, 2007 1:35 PM -0500 matthew sporleder msporleder@gmail.com wrote:

...
On 2/6/07, daniel@ncsu.edu daniel@ncsu.edu wrote:

...
Hi folk,

I want to start this message by saying, what I'm about to describe is completely vague and I don't expect to get a solution response. ;) Basically, I'm out of ideas and am looking for some suggestions as to how to debug the issue I'm running into.

Starting about half a year ago, slapd started "just dieing" out of the blue. Not a think in the logs shows up to indicate what might have caused it. The last query that I see in the logs before a crash always seems to be nothing special. I don't even see a core dump being generated yet, but then that may just be because I don't have the proper setup to get a core dump at this time. We were running the last 2.2 and upgraded to the latest release of 2.3 to make sure it wasn't an "old version" issue. Unfortunately, slapd still dies a fair amount on us. It appears to be fairly unpredictable. I've seen it crash within 1 minute of starting up slapd (then a subsequent startup 'takes' just fine). I've seen it crash when there were a number of network issues going on. I've seen it crash out of the blue when nothing appeared to be going on. I don't really have the drive space to turn on max debug logging 24/7 until the problem occurs.

We're thinking about setting up something to watch all of the network traffic going to one of the boxes until it dies. (assuming we can find something with the resources to do that)

That all said... since I have nothing solid to present, do you all have any suggestions of what would be the best way to track down what's going on? I'm literally out of ideas unless my berkeley db config is somehow causing the problem or something like that.

I apologize for the vagueness. =/ Any ideas/suggestions?

After the crash, is your bdb environment clean, or is it needing a db_recover? Depending on your OS, you could watch the pid all the time and trap the last signals received, last files accessed, etc, and that wouldn't take tons of resources.

You could try turning on max debugging and simply rotate a lot more often. (every n minutes or even seconds) This way you could definitely keep the -last- transactions and just not worry about the old ones.

Also, what database backend are you using? Why not build slapd with debugging symbols so you can get a core?

BDB and I am planning on doing so ;D

...
What version of 2.3 are you running at the moment? You say you had upgraded to the latest release at some point, but not what release that was. Up until around 2.3.28, there were issues in the connection code that caused random crashes on my servers. 2.3.33 would be your best bet to eliminate that as an issue if you aren't there yet.

2.3.32 is what we're running right now. I've been sticking with the version that's labelled as "stable". Do y'all recommend going with the release instead of the "stable"?

I've at least been having this issue since 2.2.whatever, so it's been going on for quite some time version wise. Timewise, I still think something may have changed in my world to cause all of this, but just can't track it down.

Anyway, I'm working on setting up some things with which I can track it.

Thanks!

Daniel

...
--Quanah

-- Quanah Gibson-Mount Principal Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

Quanah Gibson-Mount

10:30 a.m.

--On Wednesday, April 11, 2007 1:25 PM -0400 Daniel Henninger daniel@ncsu.edu wrote:

...

Been a while, but I finally caught a core dump. Of course, I'm not entirely sure why there's so few useful symbols showing since I compiled it with debugging symbols and didn't strip it. =/ Anyway, the information I got from it is interesting:

(gdb) bt # 0 0x000b4694 in ?? () # 1 0x000e175c in avl_delete () # 2 0x000b4c48 in bdb_idl_cache_put () # 3 0x000b5930 in bdb_idl_fetch_key () # 4 0x000b796c in bdb_key_read () # 5 0x000b30b0 in bdb_filter_candidates () # 6 0x000b3a28 in ?? () # 7 0x000b3a28 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb)

Is it possible I just have a busted version of berkeley db?! What version are you all using? (I guess it's Oracle DB now...) We are using version 4.2.52. Built with --enable-compat185.

Hi Daniel,

What does "file slapd" say? In general, "make install" will strip the symbols from slapd even if you built it with debugging etc.

As for BDB 4.2.52, you must apply the patches from Oracle as well, otherwise it is known to corrupt.

--Quanah

-- Quanah Gibson-Mount Senior Systems Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

Daniel Henninger

10:33 a.m.

On Apr 11, 2007, at 1:30 PM, Quanah Gibson-Mount wrote:

...

--On Wednesday, April 11, 2007 1:25 PM -0400 Daniel Henninger daniel@ncsu.edu wrote:

...
Been a while, but I finally caught a core dump. Of course, I'm not entirely sure why there's so few useful symbols showing since I compiled it with debugging symbols and didn't strip it. =/ Anyway, the information I got from it is interesting:

(gdb) bt # 0 0x000b4694 in ?? () # 1 0x000e175c in avl_delete () # 2 0x000b4c48 in bdb_idl_cache_put () # 3 0x000b5930 in bdb_idl_fetch_key () # 4 0x000b796c in bdb_key_read () # 5 0x000b30b0 in bdb_filter_candidates () # 6 0x000b3a28 in ?? () # 7 0x000b3a28 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb)

Is it possible I just have a busted version of berkeley db?! What version are you all using? (I guess it's Oracle DB now...) We are using version 4.2.52. Built with --enable-compat185.

Hi Daniel,

What does "file slapd" say? In general, "make install" will strip the symbols from slapd even if you built it with debugging etc.

Aww crap! =( /local/ldap/libexec/slapd: ELF 32-bit MSB executable, SPARC, version 1 (SYSV), dynamically linked (uses shared libs), stripped

Didn't think about that, thanks!

...

As for BDB 4.2.52, you must apply the patches from Oracle as well, otherwise it is known to corrupt.

Any newer ones suuggested over 4.2.52? I have no qualms at all with upgrading, just don't want to dive into a known busticated version. =)

Daniel

...

--Quanah

-- Quanah Gibson-Mount Senior Systems Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

Quanah Gibson-Mount

10:57 a.m.

--On Wednesday, April 11, 2007 1:33 PM -0400 Daniel Henninger daniel@ncsu.edu wrote:

...

...
As for BDB 4.2.52, you must apply the patches from Oracle as well, otherwise it is known to corrupt.

Any newer ones suuggested over 4.2.52? I have no qualms at all with upgrading, just don't want to dive into a known busticated version. =)

I still think BDB 4.2.52 remains the optimal version, at least until 4.6 is out. All you really need to do is make sure you apply the patches that Oracle provides from their website to BDB 4.2.52 prior to compiling it. :)

--Quanah

-- Quanah Gibson-Mount Senior Systems Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

Daniel Henninger

12 Apr 12 Apr

6:17 a.m.

Howdy! A this point I have now updated patched BDB 4.2.52 with all 5 patches from Oracle's site, and am running a slapd with actual debugging symbols. (whee!) So here's the backtrace I got this time:

#0 0xfee12d38 in fseek () from /usr/lib/libc.so.1 #1 0xfe343680 in krb5_ktfileint_internal_read_entry () from /local/kerberos/lib/libkrb5.so.3 #2 0xfe343ec8 in krb5_ktfileint_read_entry () from /local/kerberos/lib/libkrb5.so.3 #3 0xfe342660 in krb5_ktfile_get_entry () from /local/kerberos/lib/libkrb5.so.3 #4 0xfe35bc44 in krb5_rd_req_decrypt_tkt_part () from /local/kerberos/lib/libkrb5.so.3 #5 0xfe35bdcc in krb5_rd_req_decoded_opt () from /local/kerberos/lib/libkrb5.so.3 #6 0xfe35c594 in krb5_rd_req_decoded () from /local/kerberos/lib/ libkrb5.so.3 #7 0xfe35bb10 in krb5_rd_req () from /local/kerberos/lib/libkrb5.so.3 #8 0xfecd81ec in krb5_gss_accept_sec_context () from /local/kerberos/lib/libgssapi_krb5.so.2 #9 0xfece12c4 in gss_accept_sec_context () from /local/kerberos/lib/libgssapi_krb5.so.2 #10 0xfed02410 in gssapi_server_mech_step () from /local/lib/sasl2/libgssapiv2.so.2 #11 0xff1d95c0 in sasl_server_step () from /local/lib/libsasl2.so.2 #12 0xff1d92b4 in sasl_server_start () from /local/lib/libsasl2.so.2 #13 0x00074998 in slap_sasl_bind (op=0x295e898, rs=0xd7401af0) at sasl.c:1393 #14 0x0004c4d4 in fe_op_bind (op=0x295e898, rs=0xd7401af0) at bind.c:276 #15 0x0004bddc in do_bind (op=0x295e898, rs=0xd7401af0) at bind.c:200 #16 0x00032afc in connection_operation (ctx=0x170948, arg_v=0x295e898) at connection.c:1132 #17 0xff33cbb4 in ldap_int_thread_pool_wrapper (xpool=0x181b08) at tpool.c:478 #18 0xfed5b124 in _thread_start () from /usr/lib/libthread.so.1 #19 0xfed5b124 in _thread_start () from /usr/lib/libthread.so.1 Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Looks to be rather Kerberos related. We are stuck back at krb5 1.2.8 at the moment (patched to hell and back for security isms) due to a soon-to-be-gone V4 requirement that was busted in the newer Kerberos dists. That said, there's no reason I couldn't rebuild Kerberos on just that box. Are you all using krb5 w/SASL? What version of Kerberos are you running?

Alternatively, is this a known problem in openldap? I vaguely recall seeing some thread or bug report or change log entry regarding a krb5 segfault issue.

Daniel

On Apr 11, 2007, at 1:33 PM, Daniel Henninger wrote:

...

On Apr 11, 2007, at 1:30 PM, Quanah Gibson-Mount wrote:

...
--On Wednesday, April 11, 2007 1:25 PM -0400 Daniel Henninger daniel@ncsu.edu wrote:

...
Been a while, but I finally caught a core dump. Of course, I'm not entirely sure why there's so few useful symbols showing since I compiled it with debugging symbols and didn't strip it. =/ Anyway, the information I got from it is interesting:

(gdb) bt # 0 0x000b4694 in ?? () # 1 0x000e175c in avl_delete () # 2 0x000b4c48 in bdb_idl_cache_put () # 3 0x000b5930 in bdb_idl_fetch_key () # 4 0x000b796c in bdb_key_read () # 5 0x000b30b0 in bdb_filter_candidates () # 6 0x000b3a28 in ?? () # 7 0x000b3a28 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb)

Is it possible I just have a busted version of berkeley db?! What version are you all using? (I guess it's Oracle DB now...) We are using version 4.2.52. Built with --enable-compat185.

Hi Daniel,

What does "file slapd" say? In general, "make install" will strip the symbols from slapd even if you built it with debugging etc.

Aww crap! =( /local/ldap/libexec/slapd: ELF 32-bit MSB executable, SPARC, version 1 (SYSV), dynamically linked (uses shared libs), stripped

Didn't think about that, thanks!

...
As for BDB 4.2.52, you must apply the patches from Oracle as well, otherwise it is known to corrupt.

Any newer ones suuggested over 4.2.52? I have no qualms at all with upgrading, just don't want to dive into a known busticated version. =)

Daniel

...
--Quanah

-- Quanah Gibson-Mount Senior Systems Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

Quanah Gibson-Mount

8:57 a.m.

--On Thursday, April 12, 2007 9:17 AM -0400 Daniel Henninger daniel@ncsu.edu wrote:

...

Howdy! A this point I have now updated patched BDB 4.2.52 with all 5 patches from Oracle's site, and am running a slapd with actual debugging symbols. (whee!) So here's the backtrace I got this time:

# 0 0xfee12d38 in fseek () from /usr/lib/libc.so.1 # 1 0xfe343680 in krb5_ktfileint_internal_read_entry () from /local/kerberos/lib/libkrb5.so.3 # 2 0xfe343ec8 in krb5_ktfileint_read_entry () from /local/kerberos/lib/libkrb5.so.3 # 3 0xfe342660 in krb5_ktfile_get_entry () from /local/kerberos/lib/libkrb5.so.3 # 4 0xfe35bc44 in krb5_rd_req_decrypt_tkt_part () from /local/kerberos/lib/libkrb5.so.3 # 5 0xfe35bdcc in krb5_rd_req_decoded_opt () from /local/kerberos/lib/libkrb5.so.3 # 6 0xfe35c594 in krb5_rd_req_decoded () from # /local/kerberos/lib/libkrb5.so.3 7 0xfe35bb10 in krb5_rd_req () from # /local/kerberos/lib/libkrb5.so.3 8 0xfecd81ec in # krb5_gss_accept_sec_context () from /local/kerberos/lib/libgssapi_krb5.so.2 # 9 0xfece12c4 in gss_accept_sec_context () from /local/kerberos/lib/libgssapi_krb5.so.2 # 10 0xfed02410 in gssapi_server_mech_step () from /local/lib/sasl2/libgssapiv2.so.2 # 11 0xff1d95c0 in sasl_server_step () from /local/lib/libsasl2.so.2 # 12 0xff1d92b4 in sasl_server_start () from /local/lib/libsasl2.so.2 # 13 0x00074998 in slap_sasl_bind (op=0x295e898, rs=0xd7401af0) at # sasl.c:1393 14 0x0004c4d4 in fe_op_bind (op=0x295e898, rs=0xd7401af0) at # bind.c:276 15 0x0004bddc in do_bind (op=0x295e898, rs=0xd7401af0) at # bind.c:200 16 0x00032afc in connection_operation (ctx=0x170948, # arg_v=0x295e898) at connection.c:1132 # 17 0xff33cbb4 in ldap_int_thread_pool_wrapper (xpool=0x181b08) at # tpool.c:478 18 0xfed5b124 in _thread_start () from # /usr/lib/libthread.so.1 # 19 0xfed5b124 in _thread_start () from /usr/lib/libthread.so.1 Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Looks to be rather Kerberos related. We are stuck back at krb5 1.2.8 at the moment (patched to hell and back for security isms) due to a soon-to-be-gone V4 requirement that was busted in the newer Kerberos dists. That said, there's no reason I couldn't rebuild Kerberos on just that box. Are you all using krb5 w/SASL? What version of Kerberos are you running?

Alternatively, is this a known problem in openldap? I vaguely recall seeing some thread or bug report or change log entry regarding a krb5 segfault issue.

Hi Daniel,

I've always advised compiling the OpenLDAP server against Heimdal Kerberos rather than MIT Kerberos, as I've found it to be faster & to be reliable. Later versions of MIT Kerberos (1.5, 1.6) take care of the reliability issues, but are still not as fast, that I've seen. As long as you are using SASL/GSSAPI (not SASL/KERBEROSV4) then you should be just fine having OpenLDAP itself compiled against Heimdal. Stanford uses MIT Kerberos for everything else, and we've had no compatibility issues there.

--Quanah

-- Quanah Gibson-Mount Senior Systems Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

Dave Horsfall

6 Feb 6 Feb

2:25 p.m.

On Tue, 6 Feb 2007, daniel@ncsu.edu wrote:

...

I don't even see a core dump being generated yet, but then that may just be because I don't have the proper setup to get a core dump at this time.

Be aware that if SLAPD is started with "-u" then no core dump will be produced unless a kernel switch has been set; on FreeBSD it's "kern.sugid_coredump=1". It also needs to be started from a writable directory (I use "/var/tmp").

-- Dave Horsfall DTM VK2KFU daveh@ci.com.au Ph: +61 2 9552-5509 (d) -5500 (sw) Corinthian Eng'ng P/L, Ste 54 Jones Bay Whf, 26-32 Pirrama Rd, Pyrmont 2009, AU

6655

Age (days ago)

6720

Last active (days ago)

openldap-software@openldap.org

11 comments

5 participants

tags (0)

participants (5)

Daniel Henninger
daniel＠ncsu.edu
Dave Horsfall
matthew sporleder
Quanah Gibson-Mount