Hi, I've had (after a long period of stability) slapd core dumping on me, roughly once every two weeks (per slave) they don't all go at the same time but occasionally will go within a few hours of each other.
To me, it looks like it SIGSEGV's in the same place as ITS5401/5, so my question is,
Is there a patch for this I can apply to 2.4.8 or is the recommended route to check out head and try that?
Further info below.
Cheers, Duncan
I started with Solaris 10 and openldap 2.3.38 and bdb 4.2.52 (patched) and moved to 2.4.7 on a clean Solaris install, then all to 2.4.8 with bdb 4.6.21. All versions have core dumped over the last few months.
One master, 5 slaves, one specific to a service, one test, 3 in round robin config, all using syncrepl, refreshAndPersist, only the busy slaves are failing, (the ones in the round robin)
Using dbx on 2.4.8, I've got core's from 2 slaves, but haven't compiled with debugging, and I'm not using stripped binaries, so these may be of little to no use. That's today's job.
What little info I have so far is
From Slave 1
/export/opt/SUNWspro/bin/dbx /usr/local/libexec/slapd slapd-core-14-03-08 For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.4' in your .dbxrc Reading slapd core file header read successfully Reading ld.so.1 Reading libldap_r-2.4.so.2.0.4 Reading liblber-2.4.so.2.0.4 Reading libltdl.so.3.1.5 Reading libdb-4.6.so Reading librt.so.1 Reading libpthread.so.1 Reading libicuuc.so.2 Reading libicudata.so.2 Reading libsasl2.so.2.0.22 Reading libdl.so.1 Reading libssl.so.0.9.8 Reading libcrypto.so.0.9.8 Reading libresolv.so.2 Reading libgen.so.1 Reading libnsl.so.1 Reading libsocket.so.1 Reading libc.so.1 Reading libgcc_s.so.1 Reading libgcc_s.so.1 Reading libaio.so.1 Reading libmd5.so.1 Reading libm.so.2 Reading libCrun.so.1 Reading libc_psr.so.1 Reading libgssapiv2.so.2.0.22 Reading libgssapi.so.4.0.0 Reading libkrb5.so.17.4.0 Reading libasn1.so.6.1.0 Reading libroken.so.16.1.0 Reading libcom_err.so.1.1.3 Reading libncurses.so.5.4 Reading libdoor.so.1 Reading libscf.so.1 Reading libuutil.so.1 Reading libmd5_psr.so.1 Reading libmp.so.2 Reading liblogin.so.2.0.22 Reading libplain.so.2.0.22 Reading syncprov-2.4.so.2.0.4 t@1 (l@1) terminated by signal KILL (Killed) 0xfe4bd61c: __lwp_wait+0x0008: bcc,a,pt %icc,__lwp_wait+0x18 ! 0xfe4bd62c (dbx) threads
t@1 a l@1 ?() LWP suspended in __lwp_wait()
t@3 a l@3 ?() LWP suspended in __pollsys() o t@4 a l@4 ldap_int_thread_pool_wrapper() signal SIGSEGV in slap_access_allowed() t@5 a l@5 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@6 a l@6 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@7 a l@7 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@8 a l@8 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@9 a l@9 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@10 a l@10 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@11 a l@11 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@12 a l@12 ldap_int_thread_pool_wrapper() LWP suspended in attrs_alloc() t@13 a l@13 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@14 a l@14 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@15 a l@15 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@16 a l@16 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() (dbx) thread t@4 Current function is ldap_int_thread_pool_wrapper 625 task->ltt_start_routine(&ctx, task->ltt_arg); t@4 (l@4) stopped in slap_access_allowed at 0x5a234 0x0005a234: slap_access_allowed+0x10ac: ldsb [%g1 - 1], %o5 (dbx) where current thread: t@4 [1] slap_access_allowed(0x3bb2998, 0x2d442c, 0xecf3e0ac, 0x15c800, 0x3, 0x2031f0), at 0x5a234 [2] fe_access_allowed(0x3bb2998, 0x2d442c, 0x1f81b0, 0x15a2f40, 0x4, 0x0), at 0x5bc00 [3] access_allowed_mask(0x3bb2998, 0x2d442c, 0x1f81b0, 0x15a2f40, 0x4, 0x0), at 0x57048 [4] 0x54e0c(0x3bb2998, 0x2d442c, 0x15a2f3c, 0xa3, 0x603e5c0, 0xa3), at 0x54e0b [5] test_filter(0x3bb2998, 0x2d442c, 0x15a2f5c, 0x0, 0x1c0000, 0x1c0000), at 0x55400 [6] hdb_search(0x3bb2998, 0xecfffcb8, 0x0, 0xfff3ffd8, 0xfff3fc00, 0x163000), at 0xb0d98 [7] overlay_op_walk(0x8000, 0xecfffcb8, 0x8000, 0x15c540, 0x8000, 0xecfff838), at 0x95a7c [8] 0x95be4(0x3bb2998, 0xecfffcb8, 0x2, 0x5f, 0x95c28, 0x1f5d60), at 0x95be3 [9] fe_op_search(0x3bb2998, 0xecfffcb8, 0x3bb2a94, 0xecfffa38, 0x163438, 0x163528), at 0x3ae68 [10] do_search(0x3bb2998, 0xecfffcb8, 0xfe4e8bc0, 0x15c800, 0x123c00, 0xecfffa38), at 0x3a5e0 [11] 0x38b20(0xecfffe08, 0x3bb2998, 0xfe4e8bc0, 0xfdec0000, 0x13564c8, 0x0), at 0x38b1f [12] 0x393c4(0x0, 0x2a, 0xfe4e8bc0, 0xfdec0000, 0x1dad28, 0x0), at 0x393c3 =>[13] ldap_int_thread_pool_wrapper(xpool = 0x1dad18), line 625 in "tpool.c"
Pstack from slave 1 ----------------- lwp# 4 / thread# 4 -------------------- 0005a234 slap_access_allowed (3bb2998, 2d442c, ecf3e0ac, 15c800, 3, 2031f0) + 10ac 0005bc00 fe_access_allowed (3bb2998, 2d442c, 1f81b0, 15a2f40, 4, 0) + 54 00057048 access_allowed_mask (3bb2998, 2d442c, 1f81b0, 15a2f40, 4, 0) + 17c 00054e0c ???????? (3bb2998, 2d442c, 15a2f3c, a3, 603e5c0, a3) 00055400 test_filter (3bb2998, 2d442c, 15a2f5c, 0, 1c0000, 1c0000) + 178 000b0d98 hdb_search (3bb2998, ecfffcb8, 0, fff3ffd8, fff3fc00, 163000) + 2074 00095a7c overlay_op_walk (8000, ecfffcb8, 8000, 15c540, 8000, ecfff838) + c8 00095be4 ???????? (3bb2998, ecfffcb8, 2, 5f, 95c28, 1f5d60) 0003ae68 fe_op_search (3bb2998, ecfffcb8, 3bb2a94, ecfffa38, 163438, 163528) + 3a0 0003a5e0 do_search (3bb2998, ecfffcb8, fe4e8bc0, 15c800, 123c00, ecfffa38) + 58c 00038b20 ???????? (ecfffe08, 3bb2998, fe4e8bc0, fdec0000, 13564c8, 0) 000393c4 ???????? (0, 2a, fe4e8bc0, fdec0000, 1dad28, 0) ff34d89c ldap_int_thread_pool_wrapper (1dad18, ed000000, 0, 0, 0, 0) + 1ec fe4bc400 _lwp_start (0, 0, 0, 0, 0, 0)
From Slave 2 ---------------------------------------------------------------------- /export//opt/SUNWspro/bin/dbx /usr/local/libexec/slapd slapd-core-15-03-08 For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.4' in your .dbxrc Reading slapd dbx: internal warning: writable memory segment 0xed980000[2359296] of size 0 in core dbx: internal warning: writable memory segment 0xedc00000[262152192] of size 0 in core dbx: internal warning: writable memory segment 0xfd800000[5349376] of size 0 in core dbx: internal warning: writable memory segment 0xfdf30000[32768] of size 0 in core dbx: internal warning: writable memory segment 0xfdf40000[483328] of size 0 in core dbx: internal warning: writable memory segment 0xfe370000[24576] of size 0 in core core file header read successfully Reading ld.so.1 Reading libldap_r-2.4.so.2.0.4 Reading liblber-2.4.so.2.0.4 Reading libltdl.so.3.1.5 Reading libdb-4.6.so Reading librt.so.1 Reading libpthread.so.1 Reading libicuuc.so.2 Reading libicudata.so.2 Reading libsasl2.so.2.0.22 Reading libdl.so.1 Reading libssl.so.0.9.8 Reading libcrypto.so.0.9.8 Reading libresolv.so.2 Reading libgen.so.1 Reading libnsl.so.1 Reading libsocket.so.1 Reading libc.so.1 Reading libgcc_s.so.1 Reading libgcc_s.so.1 Reading libaio.so.1 Reading libmd5.so.1 Reading libm.so.2 Reading libCrun.so.1 Reading libc_psr.so.1 Reading libgssapiv2.so.2.0.22 Reading libgssapi.so.4.0.0 Reading libkrb5.so.17.4.0 Reading libasn1.so.6.1.0 Reading libroken.so.16.1.0 Reading libcom_err.so.1.1.3 Reading libncurses.so.5.4 Reading libdoor.so.1 Reading libscf.so.1 Reading libuutil.so.1 Reading libmd5_psr.so.1 Reading libmp.so.2 Reading libplain.so.2.0.22 Reading liblogin.so.2.0.22 Reading syncprov-2.4.so.2.0.4 t@1 (l@1) terminated by signal KILL (Killed) 0xfe4bd61c: __lwp_wait+0x0008: bcc,a,pt %icc,__lwp_wait+0x18 ! 0xfe4bd62c (dbx) threads
t@1 a l@1 ?() LWP suspended in __lwp_wait()
t@3 a l@3 ?() LWP suspended in __pollsys() t@4 a l@4 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@5 a l@5 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@6 a l@6 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@7 a l@7 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@8 a l@8 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@9 a l@9 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@10 a l@10 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@11 a l@11 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@12 a l@12 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@13 a l@13 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@14 a l@14 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@15 a l@15 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@16 a l@16 ldap_int_thread_pool_wrapper() LWP suspended in __lock_get_internal() t@17 a l@17 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() o t@18 a l@18 ldap_int_thread_pool_wrapper() signal SIGSEGV in match_re_C() t@19 a l@19 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() (dbx) thread t@18 Current function is ldap_int_thread_pool_wrapper 625 task->ltt_start_routine(&ctx, task->ltt_arg); t@18 (l@18) stopped in match_re_C at 0xfe479f60 0xfe479f60: match_re_C+0x0b50: ldub [%i1], %l6 (dbx) where current thread: t@18 [1] match_re_C(0x1f697b, 0x0, 0x1fa278, 0xe5f3dfac, 0xe5f3df9c, 0x73), at 0xfe479f60 [2] __regexec_C(0xfe4ea344, 0x1fa278, 0x0, 0x64, 0xe5f3e910, 0x0), at 0xfe478ecc [3] slap_access_allowed(0x475ea18, 0x263544, 0xe5f3e06c, 0x15c800, 0x3, 0x1d76d0), at 0x59428 [4] fe_access_allowed(0x475ea18, 0x263544, 0x1d8290, 0x0, 0x5, 0x0), at 0x5bc00 [5] access_allowed_mask(0x475ea18, 0x263544, 0x1d8290, 0x0, 0x5, 0x0), at 0x57048 [6] slap_send_search_entry(0x8000, 0xe5fffcb8, 0xe5f3f448, 0x0, 0x5, 0x15c800), at 0x49174 [7] hdb_search(0x475ea18, 0xe5fffcb8, 0x0, 0xfff3ffd8, 0xfff3fc00, 0x163000), at 0xb0fd4 [8] overlay_op_walk(0x8000, 0xe5fffcb8, 0x8000, 0x15c540, 0x8000, 0xe5fff838), at 0x95a7c [9] 0x95be4(0x475ea18, 0xe5fffcb8, 0x2, 0xa7, 0x95c28, 0x1f48e8), at 0x95be3 [10] fe_op_search(0x475ea18, 0xe5fffcb8, 0x475eb14, 0xe5fffa38, 0x163438, 0x163528), at 0x3ae68 [11] do_search(0x475ea18, 0xe5fffcb8, 0xfe4e8bc0, 0x15c800, 0x123c00, 0xe5fffa38), at 0x3a5e0 [12] 0x38b20(0xe5fffe08, 0x475ea18, 0xfe4e8bc0, 0xfdec3800, 0x135a360, 0x0), at 0x38b1f [13] 0x393c4(0x0, 0x63, 0xfe4e8bc0, 0xfdec3800, 0x1dad28, 0x0), at 0x393c3 =>[14] ldap_int_thread_pool_wrapper(xpool = 0x1dad18), line 625 in "tpool.c"
Pstack of thread 18 on slave 2
----------------- lwp# 18 / thread# 18 -------------------- fe479f60 match_re_C (1f697b, 0, 1fa278, e5f3dfac, e5f3df9c, 73) + b50 fe478ecc __regexec_C (fe4ea344, 1fa278, 0, 64, e5f3e910, 0) + 16c 00059428 slap_access_allowed (475ea18, 263544, e5f3e06c, 15c800, 3, 1d76d0) + 2a0 0005bc00 fe_access_allowed (475ea18, 263544, 1d8290, 0, 5, 0) + 54 00057048 access_allowed_mask (475ea18, 263544, 1d8290, 0, 5, 0) + 17c 00049174 slap_send_search_entry (8000, e5fffcb8, e5f3f448, 0, 5, 15c800) + 158 000b0fd4 hdb_search (475ea18, e5fffcb8, 0, fff3ffd8, fff3fc00, 163000) + 22b0 00095a7c overlay_op_walk (8000, e5fffcb8, 8000, 15c540, 8000, e5fff838) + c8 00095be4 ???????? (475ea18, e5fffcb8, 2, a7, 95c28, 1f48e8) 0003ae68 fe_op_search (475ea18, e5fffcb8, 475eb14, e5fffa38, 163438, 163528) + 3a0 0003a5e0 do_search (475ea18, e5fffcb8, fe4e8bc0, 15c800, 123c00, e5fffa38) + 58c 00038b20 ???????? (e5fffe08, 475ea18, fe4e8bc0, fdec3800, 135a360, 0) 000393c4 ???????? (0, 63, fe4e8bc0, fdec3800, 1dad28, 0) ff34d89c ldap_int_thread_pool_wrapper (1dad18, e6000000, 0, 0, 0, 0) + 1ec fe4bc400 _lwp_start (0, 0, 0, 0, 0, 0)
Duncan Brannen wrote:
Hi, I've had (after a long period of stability) slapd core dumping on me, roughly once every two weeks (per slave) they don't all go at the same time but occasionally will go within a few hours of each other.
To me, it looks like it SIGSEGV's in the same place as ITS5401/5, so my question is,
Nothing in your trace looks like ITS#5401 or 5405 to me.
Is there a patch for this I can apply to 2.4.8 or is the recommended route to check out head and try that?
I've been waiting for more feedback from HEAD before putting patches into RE24. Go ahead and try HEAD. But it doesn't look like any of these patches will affect your issue.
Hi Howard, You're correct in that the patches haven't helped. It still core dumps in HEAD, I've installed unstripped binaries and have a dtrace script waiting to grab more info.
Would you rather have an ITS files against 2.4.8 than HEAD, I don't know how frequently HEAD changes? HEAD is considerably less stable so far, it's crashed twice today but it may therefore be easier to trace what's happening.
Is there a preferred log level to try and help a reproducible case out of this?
Thanks, Duncan
dbx from HEAD slapd if this helps.
(dbx) where current thread: t@8 [1] _lwp_kill(0x0, 0x6, 0x0, 0x0, 0x0, 0x0), at 0xfe2c599c [2] raise(0x6, 0x0, 0xfe2f2cb8, 0xfe2a89a8, 0xffffffff, 0x6), at 0xfe2649c8 [3] abort(0xeab3ed30, 0x1, 0x0, 0xad354, 0xfe2f1318, 0x0), at 0xfe2410b8 [4] _assert(0xff32a340, 0xff32a358, 0x2ef, 0x114, 0xad070, 0xff33aa60), at 0xfe2412f4 =>[5] ber_printf(ber = 0xeab3f1f8, fmt = 0x128a18 "{s", ...), line 751 in "encode.c" [6] send_ldap_control(ber = 0xeab3f1f8, c = 0xeab3f070), line 205 in "result.c" [7] send_ldap_controls(o = 0x3612c70, ber = 0xeab3f1f8, c = 0xeab3f534), line 260 in "result.c" [8] send_ldap_response(op = 0x3612c70, rs = 0xeabffcb8), line 465 in "result.c" [9] slap_send_ldap_result(op = 0x3612c70, rs = 0xeabffcb8), line 628 in "result.c" [10] send_paged_response(op = 0x3612c70, rs = 0xeabffcb8, lastid = 0xeab3f618, tentries = 28230), line 1224 in "search.c" [11] hdb_search(op = 0x3612c70, rs = 0xeabffcb8), line 856 in "search.c" [12] overlay_op_walk(op = 0x3612c70, rs = 0xeabffcb8, which = 32768, oi = 0x160e28, on = 0x8000), line 653 in "backover.c" [13] over_op_func(op = 0x3612c70, rs = 0xeabffcb8, which = op_search), line 705 in "backover.c" [14] fe_op_search(op = 0x3612c70, rs = 0xeabffcb8), line 368 in "search.c" [15] do_search(op = 0x3612c70, rs = 0xeabffcb8), line 217 in "search.c" [16] connection_operation(ctx = 0xeabffe08, arg_v = 0x3612c70), line 1084 in "connection.c" [17] connection_read_thread(ctx = 0xeabffe08, argv = 0x48), line 1211 in "connection.c" [18] ldap_int_thread_pool_wrapper(xpool = 0x1e0dd8), line 663 in "tpool.c"
Howard Chu wrote:
Duncan Brannen wrote:
Hi, I've had (after a long period of stability) slapd core dumping on me, roughly once every two weeks (per slave) they don't all go at the same time but occasionally will go within a few hours of each other.
To me, it looks like it SIGSEGV's in the same place as ITS5401/5, so my question is,
Nothing in your trace looks like ITS#5401 or 5405 to me.
Is there a patch for this I can apply to 2.4.8 or is the recommended route to check out head and try that?
I've been waiting for more feedback from HEAD before putting patches into RE24. Go ahead and try HEAD. But it doesn't look like any of these patches will affect your issue.
openldap-software@openldap.org