Hi everyone,
I am currently looking into an issue with an environment that consists of 3 producers and 6 consumers all on OpenLdap 2.4.48. All the masters share an identical master config and all of the consumers share an identical consumer config. One consumer server in particular is crashing on a random time interval with no obvious reason. Each ldap server has sasl pass through auth setup and there is an additional slapd process on each server being used as a ldap meta to AD.
One server in particular, is crashing at seemingly random intervals. Upon initial inspection, we noticed that around the time of a crash, saslauthd logs would have an increase in failed write messages:
Dec 11 07:23:20 {REDACTED} saslauthd[2720]: user ldap_search_st() failed: Timed out Dec 11 07:23:30 {REDACTED} saslauthd[2718]: user ldap_search_st() failed: Timed out Dec 11 07:23:40 {REDACTED} saslauthd[2718]: user ldap_search_st() failed: Timed out Dec 11 07:23:56 {REDACTED} saslauthd[2717]: user ldap_search_st() failed: Timed out Dec 11 07:23:57 {REDACTED} saslauthd[2720]: user ldap_search_st() failed: Timed out Dec 11 07:23:57 {REDACTED} saslauthd[2719]: user ldap_search_st() failed: Timed out Dec 11 07:24:05 {REDACTED} saslauthd[2718]: user ldap_search_st() failed: Timed out Dec 11 07:24:06 {REDACTED} saslauthd[2717]: user ldap_search_st() failed: Timed out Dec 11 07:24:54 {REDACTED} saslauthd[2719]: : write failure Dec 11 07:24:54 {REDACTED} saslauthd[2717]: : write failure Dec 11 07:24:54 {REDACTED} saslauthd[2719]: : write: Broken pipe Dec 11 07:24:54 {REDACTED} saslauthd[2717]: : write: Broken pipe Dec 11 07:24:54 {REDACTED} saslauthd[2718]: : write failure Dec 11 07:24:54 {REDACTED} saslauthd[2720]: : write failure Dec 11 07:24:54 {REDACTED} saslauthd[2718]: : write: Broken pipe Dec 11 07:24:54 {REDACTED} saslauthd[2720]: : write: Broken pipe Dec 11 07:25:04 {REDACTED} saslauthd[2721]: : write failure Dec 11 07:25:04 {REDACTED} saslauthd[2721]: : write: Broken pipe
Initially we thought this meant that the crashes were caused by performance issues with AD, saslauthd or the ldap meta instance. We also ran slapd in full debugging mode to see if the logs have some kind of hints to the source of the issue.
The resulting slapd logs were normal. To obtain further information we obtained the following trace of the segfault using gdb:
starting [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1".
frames-invalid [New Thread 0x7ffbf19be700 (LWP 22956)]
new-thread [New Thread 0x7ffbf11bd700 (LWP 22957)]
new-thread [New Thread 0x7ffbf09bc700 (LWP 22958)]
new-thread [New Thread 0x7ffbebfff700 (LWP 22959)]
new-thread [New Thread 0x7ffbeb7fe700 (LWP 22963)]
new-thread [New Thread 0x7ffbe9cfb700 (LWP 22965)]
new-thread [New Thread 0x7ffbd3fff700 (LWP 22966)]
new-thread
signal
Program received signal signal-name SIGSEGV signal-name-end , signal-string Segmentation fault signal-string-end . [Switching to Thread 0x7ffbf09bc700 (LWP 22958)]
thread-changed
frame-begin 0 0x4e52e3
frame-function-name ldap_chain_op frame-args ( arg-begin op=op@entry arg-name-end = arg-value * 0x7ffbe41f0870 arg-end , arg-begin rs=rs@entry arg-name-end = arg-value * 0x7ffbf09bbb20 arg-end , arg-begin op_f arg-name-end = arg-value * 0x4a845e <ldap_back_search> arg-end , arg-begin ref=ref@entry arg-name-end = arg-value * 0x0 arg-end , arg-begin depth=depth@entry arg-name-end = arg-value - 0 arg-end ) frame-source-begin at frame-source-file chain.c frame-source-file-end : frame-source-line 422 frame-source-end
source ${REDACTED DIR}/chain.c:422:11629:beg:0x4e52e3
frame-end
stopped
post-prompt
#0 ldap_chain_op (op=op@entry=0x7ffbe41f0870, rs=rs@entry=0x7ffbf09bbb20, op_f=0x4a845e <ldap_back_search>, ref=ref@entry=0x0, depth=depth@entry=0) at chain.c:422 #1 0x00000000004e61da in ldap_chain_response (op=0x7ffbe41f0870, rs=0x7ffbf09bbb20) at chain.c:1090 #2 0x000000000049d323 in over_back_response (op=0x7ffbe41f0870, rs=0x7ffbf09bbb20) at backover.c:237 #3 0x0000000000447c1a in slap_response_play (op=op@entry=0x7ffbe41f0870, rs=rs@entry=0x7ffbf09bbb20) at result.c:508 #4 0x00000000004480f5 in send_ldap_response (op=op@entry=0x7ffbe41f0870, rs=rs@entry=0x7ffbf09bbb20) at result.c:583 #5 0x0000000000448d1f in slap_send_ldap_result (op=0x7ffbe41f0870, rs=0x7ffbf09bbb20) at result.c:861 #6 0x00000000004bd8c3 in mdb_search (op=0x7ffbe41f0870, rs=0x7ffbf09bbb20) at search.c:558 #7 0x000000000049dd58 in overlay_op_walk (op=op@entry=0x7ffbe41f0870, rs=0x7ffbf09bbb20, which=op_search, oi=0xa3eeb0, on=0x0) at backover.c:677 #8 0x000000000049de86 in over_op_func (op=0x7ffbe41f0870, rs=<optimized out>, which=which@entry=op_search) at backover.c:730 #9 0x000000000049dfab in over_op_search (op=<optimized out>, rs=<optimized out>) at backover.c:757 #10 0x00000000004d1a02 in relay_back_op (op=0x7ffbe41f0870, rs=0x7ffbf09bbb20, which=which@entry=2) at op.c:211 #11 0x00000000004d1b07 in relay_back_op_search (op=<optimized out>, rs=<optimized out>) at op.c:251 #12 0x000000000049dd58 in overlay_op_walk (op=op@entry=0x7ffbe41f0870, rs=0x7ffbf09bbb20, which=op_search, oi=0xa3af00, on=0x0) at backover.c:677 #13 0x000000000049de86 in over_op_func (op=0x7ffbe41f0870, rs=<optimized out>, which=which@entry=op_search) at backover.c:730 #14 0x000000000049dfab in over_op_search (op=<optimized out>, rs=<optimized out>) at backover.c:757 #15 0x000000000043c33f in fe_op_search (op=0x7ffbe41f0870, rs=0x7ffbf09bbb20) at search.c:402 #16 0x000000000049dd58 in overlay_op_walk (op=op@entry=0x7ffbe41f0870, rs=0x7ffbf09bbb20, which=op_search, oi=0x980e00, on=0x0) at backover.c:677 #17 0x000000000049de86 in over_op_func (op=0x7ffbe41f0870, rs=<optimized out>, which=which@entry=op_search) at backover.c:730 #18 0x000000000049dfab in over_op_search (op=<optimized out>, rs=<optimized out>) at backover.c:757 #19 0x000000000043bef6 in do_search (op=0x7ffbe41f0870, rs=0x7ffbf09bbb20) at search.c:247 #20 0x0000000000439bc4 in connection_operation (ctx=0x7ffbf09bbbd0, arg_v=0x7ffbe41f0870) at connection.c:1158 #21 0x0000000000563cbf in ldap_int_thread_pool_wrapper (xpool=0x8e36a0) at tpool.c:696 #22 0x00007ffff754cdd5 in start_thread (arg=0x7ffbf09bc700) at pthread_create.c:307 #23 0x00007ffff674d02d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 #0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95 #1 0x00007ffff66bd6ac in _L_lock_39 () from /lib64/libc.so.6 #2 0x00007ffff66bd5b4 in __GI__IO_fflush (fp=0x937d00) at iofflush.c:39 #3 0x000000000058bbc6 in ber_error_print (data=0x7ffbd3ffdc10 " aaf9: 04 3a 75 69 64 3d 53 49 41 53 47 52 50 2c 6f 75 .:uid=SIASGRP,ou \n") at bprint.c:87 #4 0x000000000058be41 in ber_bprint (data=0x7ffbcd83df04 "0\203\n\036\273\004\funiqueMember1\203\n\036\250\004\071uid=wtso732,ou={REDACTED}“…, len=len@entry=663561) at bprint.c:202 #5 0x000000000058c053 in ber_dump (ber=ber@entry=0x7ffbcc122e20, inout=inout@entry=1) at bprint.c:276 #6 0x000000000058c0ab in ber_log_dump (errlvl=errlvl@entry=16, loglvl=<optimized out>, ber=ber@entry=0x7ffbcc122e20, inout=inout@entry=1) at bprint.c:247 #7 0x00000000005894c1 in ber_scanf (ber=0x7ffbcc122e20, fmt=fmt@entry=0x5b09b3 "{mW}") at decode.c:723 #8 0x0000000000492ba8 in syncrepl_message_to_entry (syncUUID=0x7ffbd3ffdf40, syncstate=<optimized out>, entry=<synthetic pointer>, modlist=0x7ffbd3ffe320, msg=<optimized out>, op=0x7ffbd3ffe510, si=0xa3e9e0) at syncrepl.c:2658 #9 do_syncrep2 (op=op@entry=0x7ffbd3ffe510, si=si@entry=0xa3e9e0) at syncrepl.c:1043 #10 0x00000000004976db in do_syncrepl (ctx=<optimized out>, arg=0xa3e670) at syncrepl.c:1571 #11 0x0000000000563cbf in ldap_int_thread_pool_wrapper (xpool=0x8e36a0) at tpool.c:696 #12 0x00007ffff754cdd5 in start_thread (arg=0x7ffbd3fff700) at pthread_create.c:307 #13 0x00007ffff674d02d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 #0 0x00007ffff673e16d in write () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff66c91a3 in _IO_new_file_write (f=0x937c20, data=0x7ffff7ff8000, n=162) at fileops.c:1258 #2 0x00007ffff66ca9be in new_do_write (to_do=162, data=0x7ffff7ff8000 " 4530: 62 61 6c 2c 64 63 3d 61 65 70 2c 64 63 3d 63 6f bal,dc={REDACTED} \n5e0a2340 >>> dnPretty: <uid={REDACTED}>\n.. \n5e0a2340 daemon: epoll: listen=19"..., fp=0x937c20) at fileops.c:519 #3 _IO_new_do_write (fp=0x937c20, data=0x7ffff7ff8000 " 4530: 62 61 6c 2c 64 63 3d 61 65 70 2c 64 63 3d 63 6f bal,dc={REDACTED} \n5e0a2340 >>> dnPretty: <uid={REDACTED}>\n.. \n5e0a2340 daemon: epoll: listen=19"..., to_do=162) at fileops.c:495 #4 0x00007ffff66c87f8 in _IO_new_file_sync (fp=0x937c20) at fileops.c:869 #5 0x00007ffff66bd5f3 in __GI__IO_fflush (fp=0x937c20) at iofflush.c:40 #6 0x000000000058c21a in lutil_debug (debug=<optimized out>, level=level@entry=1, fmt=fmt@entry=0x59c7a6 ">>> dnPretty: <%s>\n") at debug.c:81 #7 0x000000000044c792 in dnPretty (syntax=syntax@entry=0x8c4810, val=val@entry=0x7ffbe9cf9d00, out=out@entry=0x7ffbe9cf9de0, ctx=ctx@entry=0x0) at dn.c:539 #8 0x000000000046c614 in nameUIDPretty (syntax=0x8c4810, val=0x7ffbd9845600, out=0x7ffbe9cf9de0, ctx=0x0) at schema_init.c:1350 #9 0x000000000045383d in ordered_value_pretty (ad=ad@entry=0xa40470, val=0x7ffbd9845600, out=out@entry=0x7ffbe9cf9de0, ctx=ctx@entry=0x0) at value.c:511 #10 0x000000000044f676 in slap_mods_check (op=op@entry=0x7ffbe9cfa510, ml=0x7ffbd992c410, text=text@entry=0x7ffbe9cf9f60, textbuf=textbuf@entry=0x7ffbe9cfa180 "H\034\032\330\373\177", textlen=textlen@entry=256, ctx=ctx@entry=0x0) at modify.c:575 #11 0x0000000000492e10 in syncrepl_message_to_entry (syncUUID=0x7ffbe9cf9f40, syncstate=<optimized out>, entry=<synthetic pointer>, modlist=0x7ffbe9cfa320, msg=<optimized out>, op=0x7ffbe9cfa510, si=0xa3fb20) at syncrepl.c:2715 #12 do_syncrep2 (op=op@entry=0x7ffbe9cfa510, si=si@entry=0xa3fb20) at syncrepl.c:1043 #13 0x00000000004976db in do_syncrepl (ctx=<optimized out>, arg=0xa40060) at syncrepl.c:1571 #14 0x0000000000563cbf in ldap_int_thread_pool_wrapper (xpool=0x8e36a0) at tpool.c:696 #15 0x00007ffff754cdd5 in start_thread (arg=0x7ffbe9cfb700) at pthread_create.c:307 #16 0x00007ffff674d02d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 #0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95 #1 0x00007ffff66bd6ac in _L_lock_39 () from /lib64/libc.so.6 #2 0x00007ffff66bd5b4 in __GI__IO_fflush (fp=0x937d00) at iofflush.c:39 #3 0x000000000058bbc6 in ber_error_print (data=0x7ffbeb7fcc10 " 4530: 62 61 6c 2c 64 63 3d 61 65 70 2c 64 63 3d 63 6f bal,dc={REDACTED} \n") at bprint.c:87 #4 0x000000000058be41 in ber_bprint (data=0x7ffbd576fee7 "0\027\004\002cn1\021\004\017CSC User Access0-\004\025structuralObjectClass1\024\004\022groupOfUniqueNames03\004\tentryUUID1&\004$13257271-b706-49bc-a2a9-a5f3e7f4a0dd0L\004\fcreatorsName1<\004:cn={REDACTED}”…, len=len@entry=61665) at bprint.c:202 #5 0x000000000058c053 in ber_dump (ber=ber@entry=0x7ffbd4319130, inout=inout@entry=1) at bprint.c:276 #6 0x000000000058c0ab in ber_log_dump (errlvl=errlvl@entry=16, loglvl=<optimized out>, ber=ber@entry=0x7ffbd4319130, inout=inout@entry=1) at bprint.c:247 #7 0x00000000005894c1 in ber_scanf (ber=0x7ffbd4319130, fmt=fmt@entry=0x5b09b3 "{mW}") at decode.c:723 #8 0x0000000000492ba8 in syncrepl_message_to_entry (syncUUID=0x7ffbeb7fcf40, syncstate=<optimized out>, entry=<synthetic pointer>, modlist=0x7ffbeb7fd320, msg=<optimized out>, op=0x7ffbeb7fd510, si=0xa3f3b0) at syncrepl.c:2658 #9 do_syncrep2 (op=op@entry=0x7ffbeb7fd510, si=si@entry=0xa3f3b0) at syncrepl.c:1043 #10 0x00000000004976db in do_syncrepl (ctx=<optimized out>, arg=0xa3f8b0) at syncrepl.c:1571 #11 0x0000000000563cbf in ldap_int_thread_pool_wrapper (xpool=0x8e36a0) at tpool.c:696 #12 0x00007ffff754cdd5 in start_thread (arg=0x7ffbeb7fe700) at pthread_create.c:307 #13 0x00007ffff674d02d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 #0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95 #1 0x00007ffff66bd6ac in _L_lock_39 () from /lib64/libc.so.6 #2 0x00007ffff66bd5b4 in __GI__IO_fflush (fp=0x937d00) at iofflush.c:39 #3 0x000000000058bbc6 in ber_error_print (data=0x7ffbebe6cf20 " b6c6: 03 0a 01 04 30 22 02 03 02 38 56 17 0d 31 38 30 ....0"...8V..180 \n") at bprint.c:87 #4 0x000000000058be41 in ber_bprint (data=0x7ffbe04282a0 "0\203UK\275\002\001\001d\203UK\265\004\062cn={REDACTED}0\203UK|0\203UKw\004 certificateRevocationList;binary1\203UKP\004\203UKK0\203UKF0\203UJ-\002\001\001\060\r\006\t*\206H\206\367\r\001\001\v\005", len=len@entry=5589954) at bprint.c:202 #5 0x000000000058bfb3 in ber_log_bprint (errlvl=errlvl@entry=16, loglvl=<optimized out>, data=<optimized out>, len=len@entry=5589954) at bprint.c:175 #6 0x000000000058b179 in ber_flush2 (sb=0x7ffbe17f55b0, ber=ber@entry=0x7ffbebe6d0f0, freeit=freeit@entry=0) at io.c:240 #7 0x0000000000447ee5 in send_ldap_ber (op=op@entry=0x7ffbe41bd940, ber=ber@entry=0x7ffbebe6d0f0) at result.c:339 #8 0x000000000044b290 in slap_send_search_entry (op=0x7ffbe41bd940, rs=<optimized out>) at result.c:1430 #9 0x00000000004bf3b4 in mdb_search (op=<optimized out>, rs=0x7ffbebffeaa0) at search.c:1085 #10 0x000000000049dd58 in overlay_op_walk (op=op@entry=0x7ffbe41bd940, rs=0x7ffbebffeaa0, which=op_search, oi=0xa3eeb0, on=0x0) at backover.c:677 #11 0x000000000049de86 in over_op_func (op=0x7ffbe41bd940, rs=<optimized out>, which=which@entry=op_search) at backover.c:730 #12 0x000000000049dfab in over_op_search (op=<optimized out>, rs=<optimized out>) at backover.c:757 #13 0x000000000043c33f in fe_op_search (op=0x7ffbe41bd940, rs=0x7ffbebffeaa0) at search.c:402 #14 0x000000000049dd58 in overlay_op_walk (op=op@entry=0x7ffbe41bd940, rs=0x7ffbebffeaa0, which=op_search, oi=0x980e00, on=0x0) at backover.c:677 #15 0x000000000049de86 in over_op_func (op=0x7ffbe41bd940, rs=<optimized out>, which=which@entry=op_search) at backover.c:730 #16 0x000000000049dfab in over_op_search (op=<optimized out>, rs=<optimized out>) at backover.c:757 #17 0x000000000043bef6 in do_search (op=0x7ffbe41bd940, rs=0x7ffbebffeaa0) at search.c:247 #18 0x0000000000439bc4 in connection_operation (ctx=ctx@entry=0x7ffbebffebd0, arg_v=arg_v@entry=0x7ffbe41bd940) at connection.c:1158 #19 0x000000000043a9cb in connection_read_thread (ctx=0x7ffbebffebd0, argv=<optimized out>) at connection.c:1294 #20 0x0000000000563cbf in ldap_int_thread_pool_wrapper (xpool=0x8e36a0) at tpool.c:696 #21 0x00007ffff754cdd5 in start_thread (arg=0x7ffbebfff700) at pthread_create.c:307 #22 0x00007ffff674d02d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 #0 ldap_chain_op (op=op@entry=0x7ffbe41f0870, rs=rs@entry=0x7ffbf09bbb20, op_f=0x4a845e <ldap_back_search>, ref=ref@entry=0x0, depth=depth@entry=0) at chain.c:422 #1 0x00000000004e61da in ldap_chain_response (op=0x7ffbe41f0870, rs=0x7ffbf09bbb20) at chain.c:1090 #2 0x000000000049d323 in over_back_response (op=0x7ffbe41f0870, rs=0x7ffbf09bbb20) at backover.c:237 #3 0x0000000000447c1a in slap_response_play (op=op@entry=0x7ffbe41f0870, rs=rs@entry=0x7ffbf09bbb20) at result.c:508 #4 0x00000000004480f5 in send_ldap_response (op=op@entry=0x7ffbe41f0870, rs=rs@entry=0x7ffbf09bbb20) at result.c:583 #5 0x0000000000448d1f in slap_send_ldap_result (op=0x7ffbe41f0870, rs=0x7ffbf09bbb20) at result.c:861 #6 0x00000000004bd8c3 in mdb_search (op=0x7ffbe41f0870, rs=0x7ffbf09bbb20) at search.c:558 #7 0x000000000049dd58 in overlay_op_walk (op=op@entry=0x7ffbe41f0870, rs=0x7ffbf09bbb20, which=op_search, oi=0xa3eeb0, on=0x0) at backover.c:677 #8 0x000000000049de86 in over_op_func (op=0x7ffbe41f0870, rs=<optimized out>, which=which@entry=op_search) at backover.c:730 #9 0x000000000049dfab in over_op_search (op=<optimized out>, rs=<optimized out>) at backover.c:757 #10 0x00000000004d1a02 in relay_back_op (op=0x7ffbe41f0870, rs=0x7ffbf09bbb20, which=which@entry=2) at op.c:211 #11 0x00000000004d1b07 in relay_back_op_search (op=<optimized out>, rs=<optimized out>) at op.c:251 #12 0x000000000049dd58 in overlay_op_walk (op=op@entry=0x7ffbe41f0870, rs=0x7ffbf09bbb20, which=op_search, oi=0xa3af00, on=0x0) at backover.c:677 #13 0x000000000049de86 in over_op_func (op=0x7ffbe41f0870, rs=<optimized out>, which=which@entry=op_search) at backover.c:730 #14 0x000000000049dfab in over_op_search (op=<optimized out>, rs=<optimized out>) at backover.c:757 #15 0x000000000043c33f in fe_op_search (op=0x7ffbe41f0870, rs=0x7ffbf09bbb20) at search.c:402 #16 0x000000000049dd58 in overlay_op_walk (op=op@entry=0x7ffbe41f0870, rs=0x7ffbf09bbb20, which=op_search, oi=0x980e00, on=0x0) at backover.c:677 #17 0x000000000049de86 in over_op_func (op=0x7ffbe41f0870, rs=<optimized out>, which=which@entry=op_search) at backover.c:730 #18 0x000000000049dfab in over_op_search (op=<optimized out>, rs=<optimized out>) at backover.c:757 #19 0x000000000043bef6 in do_search (op=0x7ffbe41f0870, rs=0x7ffbf09bbb20) at search.c:247 #20 0x0000000000439bc4 in connection_operation (ctx=0x7ffbf09bbbd0, arg_v=0x7ffbe41f0870) at connection.c:1158 #21 0x0000000000563cbf in ldap_int_thread_pool_wrapper (xpool=0x8e36a0) at tpool.c:696 #22 0x00007ffff754cdd5 in start_thread (arg=0x7ffbf09bc700) at pthread_create.c:307 #23 0x00007ffff674d02d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 #0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95 #1 0x00007ffff66bd6ac in _L_lock_39 () from /lib64/libc.so.6 #2 0x00007ffff66bd5b4 in __GI__IO_fflush (fp=0x937d00) at iofflush.c:39 #3 0x000000000058bbc6 in ber_error_print (data=0x7ffbf102af20 " e54c: 33 5a 30 0c 30 0a 06 03 55 1d 15 04 03 0a 01 04 3Z0.0...U....... \n") at bprint.c:87 #4 0x000000000058be41 in ber_bprint (data=0x7ffbe42cc110 "0\203UK\275\002\001\021d\203UK\265\004\062cn={REDACTED}0\203UK|0\203UKw\004 certificateRevocationList;binary1\203UKP\004\203UKK0\203UKF0\203UJ-\002\001\001\060\r\006\t*\206H\206\367\r\001\001\v\005", len=len@entry=5589954) at bprint.c:202 #5 0x000000000058bfb3 in ber_log_bprint (errlvl=errlvl@entry=16, loglvl=<optimized out>, data=<optimized out>, len=len@entry=5589954) at bprint.c:175 #6 0x000000000058b179 in ber_flush2 (sb=0x7ffbe422daa0, ber=ber@entry=0x7ffbf102b0f0, freeit=freeit@entry=0) at io.c:240 #7 0x0000000000447ee5 in send_ldap_ber (op=op@entry=0x7ffbe42197b0, ber=ber@entry=0x7ffbf102b0f0) at result.c:339 #8 0x000000000044b290 in slap_send_search_entry (op=0x7ffbe42197b0, rs=<optimized out>) at result.c:1430 #9 0x00000000004bf3b4 in mdb_search (op=<optimized out>, rs=0x7ffbf11bcaa0) at search.c:1085 #10 0x000000000049dd58 in overlay_op_walk (op=op@entry=0x7ffbe42197b0, rs=0x7ffbf11bcaa0, which=op_search, oi=0xa3eeb0, on=0x0) at backover.c:677 #11 0x000000000049de86 in over_op_func (op=0x7ffbe42197b0, rs=<optimized out>, which=which@entry=op_search) at backover.c:730 #12 0x000000000049dfab in over_op_search (op=<optimized out>, rs=<optimized out>) at backover.c:757 #13 0x000000000043c33f in fe_op_search (op=0x7ffbe42197b0, rs=0x7ffbf11bcaa0) at search.c:402 #14 0x000000000049dd58 in overlay_op_walk (op=op@entry=0x7ffbe42197b0, rs=0x7ffbf11bcaa0, which=op_search, oi=0x980e00, on=0x0) at backover.c:677 #15 0x000000000049de86 in over_op_func (op=0x7ffbe42197b0, rs=<optimized out>, which=which@entry=op_search) at backover.c:730 #16 0x000000000049dfab in over_op_search (op=<optimized out>, rs=<optimized out>) at backover.c:757 #17 0x000000000043bef6 in do_search (op=0x7ffbe42197b0, rs=0x7ffbf11bcaa0) at search.c:247 #18 0x0000000000439bc4 in connection_operation (ctx=ctx@entry=0x7ffbf11bcbd0, arg_v=arg_v@entry=0x7ffbe42197b0) at connection.c:1158 #19 0x000000000043a9cb in connection_read_thread (ctx=0x7ffbf11bcbd0, argv=<optimized out>) at connection.c:1294 #20 0x0000000000563cbf in ldap_int_thread_pool_wrapper (xpool=0x8e36a0) at tpool.c:696 #21 0x00007ffff754cdd5 in start_thread (arg=0x7ffbf11bd700) at pthread_create.c:307 #22 0x00007ffff674d02d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 #0 0x00007ffff674d603 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 #1 0x000000000043618b in slapd_daemon_task (ptr=<optimized out>) at daemon.c:2539 #2 0x00007ffff754cdd5 in start_thread (arg=0x7ffbf19be700) at pthread_create.c:307 #3 0x00007ffff674d02d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 #0 0x00007ffff754df47 in pthread_join (threadid=140720067045120, thread_return=thread_return@entry=0x0) at pthread_join.c:90 #1 0x00000000005642b8 in ldap_pvt_thread_join (thread=<optimized out>, thread_return=thread_return@entry=0x0) at thr_posix.c:197 #2 0x0000000000437766 in slapd_daemon () at daemon.c:2932 #3 0x0000000000420b84 in main (argc=<optimized out>, argv=0x7fffffffe408) at main.c:1017
The above trace seems to suggest the source of the segfault is coming from the ldap_chain_op function at chain.c:422. We use chain forwarding for ppolicy operations on the consumers.
This is the line:
for ( ; !BER_BVISNULL( ref ); ref++ ) {
BER_BVISNULL is not a function but a macro which is defined (lber_pvt.h:216) as follows:
#define BER_BVISNULL(bv) ((bv)->bv_val == NULL)
The above seems suggests that ref is set to null at the time of the check.
When I double checked my chain configs I noticed there was an entry that did not contain the olcDbURI attribute that I did not create. It seems that this particular entry was created by the cn=config conversion for the following slapd.conf configuration:
Overlay chain
chain-proxy-whoami yes
chain-uri {REDACTED URI 1} Chain-cache-uri FALSE Chain-max-depth 1 Chain-return-error TRUE Chain-idassert-bind bindmethod=“simple” binddn={REDACTED} credentials={REDACTED} Chain-protocol-version 3
chain-uri {REDACTED URI 2} Chain-cache-uri FALSE Chain-max-depth 1 Chain-return-error TRUE Chain-idassert-bind bindmethod=“simple” binddn={REDACTED} credentials={REDACTED} Chain-protocol-version 3
chain-uri {REDACTED URI 3} Chain-cache-uri FALSE Chain-max-depth 1 Chain-return-error TRUE Chain-idassert-bind bindmethod=“simple” binddn={REDACTED} credentials={REDACTED} Chain-protocol-version 3
After the conversion to cn=config:
dn: olcOverlay={1}chain,olcDatabase={-1}frontend,cn=config objectClass: olcOverlayConfig objectClass: olcChainConfig olcOverlay: {1}chain olcChainCacheURI: FALSE olcChainMaxReferralDepth: 1 olcChainReturnError: TRUE
dn: olcDatabase={0}ldap,olcOverlay={1}chain,olcDatabase={-1}frontend,cn=config objectClass: olcLDAPConfig objectClass: olcChainDatabase olcDatabase: {0}ldap olcDbStartTLS: ldaps starttls=no olcDbRebindAsUser: FALSE olcDbChaseReferrals: TRUE olcDbTFSupport: no olcDbProxyWhoAmI: TRUE olcDbProtocolVersion: 3 olcDbSingleConn: FALSE olcDbCancel: abandon olcDbUseTemporaryConn: FALSE olcDbConnectionPoolMax: 16 olcDbSessionTrackingRequest: FALSE olcDbNoRefs: FALSE olcDbNoUndefFilter: FALSE olcDbOnErr: continue olcDbKeepalive: 0:0:0
dn: olcDatabase={1}ldap,olcOverlay={1}chain,olcDatabase={-1}frontend,cn=config objectClass: olcLDAPConfig objectClass: olcChainDatabase olcDatabase: {1}ldap olcDbURI: “{REDACTED URI 1}“ olcDbStartTLS: ldaps starttls=no tls_cacert=“/{REDACTED}/openldap/tls/ cacerts.cer" tls_reqcert=demand tls_crlcheck=none olcDbIDAssertBind: mode=self flags=prescriptive,proxy-authz-non-critical bindm ethod=simple timeout=0 network-timeout=0 binddn={REDACTED} credentials={REDACTED} keepalive=0:0:0 olcDbRebindAsUser: FALSE olcDbChaseReferrals: TRUE olcDbTFSupport: no olcDbProxyWhoAmI: TRUE olcDbProtocolVersion: 3 olcDbSingleConn: FALSE olcDbCancel: abandon olcDbUseTemporaryConn: FALSE olcDbConnectionPoolMax: 16 olcDbSessionTrackingRequest: FALSE olcDbNoRefs: FALSE olcDbNoUndefFilter: FALSE olcDbOnErr: continue olcDbKeepalive: 0:0:0
dn: olcDatabase={2}ldap,olcOverlay={1}chain,olcDatabase={-1}frontend,cn=config objectClass: olcLDAPConfig objectClass: olcChainDatabase olcDatabase: {2}ldap olcDbURI: “{REDACTED URI 2}“ olcDbStartTLS: ldaps starttls=no tls_cacert=“/{REDACTED}/openldap/tls/ cacerts.cer" tls_reqcert=demand tls_crlcheck=none olcDbIDAssertBind: mode=self flags=prescriptive,proxy-authz-non-critical bindm ethod=simple timeout=0 network-timeout=0 binddn={REDACTED} credentials={REDACTED} keepalive=0:0:0 olcDbRebindAsUser: FALSE olcDbChaseReferrals: TRUE olcDbTFSupport: no olcDbProxyWhoAmI: TRUE olcDbProtocolVersion: 3 olcDbSingleConn: FALSE olcDbCancel: abandon olcDbUseTemporaryConn: FALSE olcDbConnectionPoolMax: 16 olcDbSessionTrackingRequest: FALSE olcDbNoRefs: FALSE olcDbNoUndefFilter: FALSE olcDbOnErr: continue olcDbKeepalive: 0:0:0
dn: olcDatabase={3}ldap,olcOverlay={1}chain,olcDatabase={-1}frontend,cn=config objectClass: olcLDAPConfig objectClass: olcChainDatabase olcDatabase: {3}ldap olcDbURI: “{REDACTED URI 3}“ olcDbStartTLS: ldaps starttls=no tls_cacert=“/{REDACTED}/openldap/tls/ cacerts.cer" tls_reqcert=demand tls_crlcheck=none olcDbIDAssertBind: mode=self flags=prescriptive,proxy-authz-non-critical bindm ethod=simple timeout=0 network-timeout=0 binddn={REDACTED} credentials={REDACTED} keepalive=0:0:0 olcDbRebindAsUser: FALSE olcDbChaseReferrals: TRUE olcDbTFSupport: no olcDbProxyWhoAmI: TRUE olcDbProtocolVersion: 3 olcDbSingleConn: FALSE olcDbCancel: abandon olcDbUseTemporaryConn: FALSE olcDbConnectionPoolMax: 16 olcDbSessionTrackingRequest: FALSE olcDbNoRefs: FALSE olcDbNoUndefFilter: FALSE olcDbOnErr: continue olcDbKeepalive: 0:0:0
I am wondering if the following is causing the segfaults:
When setting up the configs during the slapd.d conversion, the slapd.conf -> cn=config converter added an extra unnecessary entry for the chain config (there are 4 ldap proxy entries instead of 3). This particular entry that was added does not have an olcDbURI value. Since the referral string is set by the olcDbURI attribute value the missing value results in NULL being set for ref.
However, I have not yet been able to manually trigger the segfault by invoking ppolicy.
I also found a red hat ticket that seemd to detail the same issue (see https://bugzilla.redhat.com/show_bug.cgi?id=1644654).
Any ideas on this? I can also obtain a trace of all the threads if needed.
Jeremy
Hi Jeremy,
--On Thursday, January 9, 2020 1:07 PM -0800 Jeremy Diaz jeremy.diaz@rexconsulting.net wrote:
The resulting slapd logs were normal. To obtain further information we obtained the following trace of the segfault using gdb:
Please file an ITS for this issue, with a full gdb thr apply all bt full from the crash so we have the entire stack trace.
When I double checked my chain configs I noticed there was an entry that did not contain the olcDbURI attribute that I did not create. It seems that this particular entry was created by the cn=config conversion for the following slapd.conf configuration:
This is expected. With slapd.conf the same thing happens, it's just done internally so you never see it.
Thanks, Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
openldap-technical@openldap.org