Hi, I've had (after a long period of stability) slapd core dumping on me, roughly once every two weeks (per slave) they don't all go at the same time but occasionally will go within a few hours of each other.
To me, it looks like it SIGSEGV's in the same place as ITS5401/5, so my question is,
Is there a patch for this I can apply to 2.4.8 or is the recommended route to check out head and try that?
Further info below.
Cheers, Duncan
I started with Solaris 10 and openldap 2.3.38 and bdb 4.2.52 (patched) and moved to 2.4.7 on a clean Solaris install, then all to 2.4.8 with bdb 4.6.21. All versions have core dumped over the last few months.
One master, 5 slaves, one specific to a service, one test, 3 in round robin config, all using syncrepl, refreshAndPersist, only the busy slaves are failing, (the ones in the round robin)
Using dbx on 2.4.8, I've got core's from 2 slaves, but haven't compiled with debugging, and I'm not using stripped binaries, so these may be of little to no use. That's today's job.
What little info I have so far is
From Slave 1
/export/opt/SUNWspro/bin/dbx /usr/local/libexec/slapd slapd-core-14-03-08 For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.4' in your .dbxrc Reading slapd core file header read successfully Reading ld.so.1 Reading libldap_r-2.4.so.2.0.4 Reading liblber-2.4.so.2.0.4 Reading libltdl.so.3.1.5 Reading libdb-4.6.so Reading librt.so.1 Reading libpthread.so.1 Reading libicuuc.so.2 Reading libicudata.so.2 Reading libsasl2.so.2.0.22 Reading libdl.so.1 Reading libssl.so.0.9.8 Reading libcrypto.so.0.9.8 Reading libresolv.so.2 Reading libgen.so.1 Reading libnsl.so.1 Reading libsocket.so.1 Reading libc.so.1 Reading libgcc_s.so.1 Reading libgcc_s.so.1 Reading libaio.so.1 Reading libmd5.so.1 Reading libm.so.2 Reading libCrun.so.1 Reading libc_psr.so.1 Reading libgssapiv2.so.2.0.22 Reading libgssapi.so.4.0.0 Reading libkrb5.so.17.4.0 Reading libasn1.so.6.1.0 Reading libroken.so.16.1.0 Reading libcom_err.so.1.1.3 Reading libncurses.so.5.4 Reading libdoor.so.1 Reading libscf.so.1 Reading libuutil.so.1 Reading libmd5_psr.so.1 Reading libmp.so.2 Reading liblogin.so.2.0.22 Reading libplain.so.2.0.22 Reading syncprov-2.4.so.2.0.4 t@1 (l@1) terminated by signal KILL (Killed) 0xfe4bd61c: __lwp_wait+0x0008: bcc,a,pt %icc,__lwp_wait+0x18 ! 0xfe4bd62c (dbx) threads
t@1 a l@1 ?() LWP suspended in __lwp_wait()
t@3 a l@3 ?() LWP suspended in __pollsys() o t@4 a l@4 ldap_int_thread_pool_wrapper() signal SIGSEGV in slap_access_allowed() t@5 a l@5 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@6 a l@6 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@7 a l@7 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@8 a l@8 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@9 a l@9 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@10 a l@10 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@11 a l@11 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@12 a l@12 ldap_int_thread_pool_wrapper() LWP suspended in attrs_alloc() t@13 a l@13 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@14 a l@14 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@15 a l@15 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@16 a l@16 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() (dbx) thread t@4 Current function is ldap_int_thread_pool_wrapper 625 task->ltt_start_routine(&ctx, task->ltt_arg); t@4 (l@4) stopped in slap_access_allowed at 0x5a234 0x0005a234: slap_access_allowed+0x10ac: ldsb [%g1 - 1], %o5 (dbx) where current thread: t@4 [1] slap_access_allowed(0x3bb2998, 0x2d442c, 0xecf3e0ac, 0x15c800, 0x3, 0x2031f0), at 0x5a234 [2] fe_access_allowed(0x3bb2998, 0x2d442c, 0x1f81b0, 0x15a2f40, 0x4, 0x0), at 0x5bc00 [3] access_allowed_mask(0x3bb2998, 0x2d442c, 0x1f81b0, 0x15a2f40, 0x4, 0x0), at 0x57048 [4] 0x54e0c(0x3bb2998, 0x2d442c, 0x15a2f3c, 0xa3, 0x603e5c0, 0xa3), at 0x54e0b [5] test_filter(0x3bb2998, 0x2d442c, 0x15a2f5c, 0x0, 0x1c0000, 0x1c0000), at 0x55400 [6] hdb_search(0x3bb2998, 0xecfffcb8, 0x0, 0xfff3ffd8, 0xfff3fc00, 0x163000), at 0xb0d98 [7] overlay_op_walk(0x8000, 0xecfffcb8, 0x8000, 0x15c540, 0x8000, 0xecfff838), at 0x95a7c [8] 0x95be4(0x3bb2998, 0xecfffcb8, 0x2, 0x5f, 0x95c28, 0x1f5d60), at 0x95be3 [9] fe_op_search(0x3bb2998, 0xecfffcb8, 0x3bb2a94, 0xecfffa38, 0x163438, 0x163528), at 0x3ae68 [10] do_search(0x3bb2998, 0xecfffcb8, 0xfe4e8bc0, 0x15c800, 0x123c00, 0xecfffa38), at 0x3a5e0 [11] 0x38b20(0xecfffe08, 0x3bb2998, 0xfe4e8bc0, 0xfdec0000, 0x13564c8, 0x0), at 0x38b1f [12] 0x393c4(0x0, 0x2a, 0xfe4e8bc0, 0xfdec0000, 0x1dad28, 0x0), at 0x393c3 =>[13] ldap_int_thread_pool_wrapper(xpool = 0x1dad18), line 625 in "tpool.c"
Pstack from slave 1 ----------------- lwp# 4 / thread# 4 -------------------- 0005a234 slap_access_allowed (3bb2998, 2d442c, ecf3e0ac, 15c800, 3, 2031f0) + 10ac 0005bc00 fe_access_allowed (3bb2998, 2d442c, 1f81b0, 15a2f40, 4, 0) + 54 00057048 access_allowed_mask (3bb2998, 2d442c, 1f81b0, 15a2f40, 4, 0) + 17c 00054e0c ???????? (3bb2998, 2d442c, 15a2f3c, a3, 603e5c0, a3) 00055400 test_filter (3bb2998, 2d442c, 15a2f5c, 0, 1c0000, 1c0000) + 178 000b0d98 hdb_search (3bb2998, ecfffcb8, 0, fff3ffd8, fff3fc00, 163000) + 2074 00095a7c overlay_op_walk (8000, ecfffcb8, 8000, 15c540, 8000, ecfff838) + c8 00095be4 ???????? (3bb2998, ecfffcb8, 2, 5f, 95c28, 1f5d60) 0003ae68 fe_op_search (3bb2998, ecfffcb8, 3bb2a94, ecfffa38, 163438, 163528) + 3a0 0003a5e0 do_search (3bb2998, ecfffcb8, fe4e8bc0, 15c800, 123c00, ecfffa38) + 58c 00038b20 ???????? (ecfffe08, 3bb2998, fe4e8bc0, fdec0000, 13564c8, 0) 000393c4 ???????? (0, 2a, fe4e8bc0, fdec0000, 1dad28, 0) ff34d89c ldap_int_thread_pool_wrapper (1dad18, ed000000, 0, 0, 0, 0) + 1ec fe4bc400 _lwp_start (0, 0, 0, 0, 0, 0)
From Slave 2 ---------------------------------------------------------------------- /export//opt/SUNWspro/bin/dbx /usr/local/libexec/slapd slapd-core-15-03-08 For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.4' in your .dbxrc Reading slapd dbx: internal warning: writable memory segment 0xed980000[2359296] of size 0 in core dbx: internal warning: writable memory segment 0xedc00000[262152192] of size 0 in core dbx: internal warning: writable memory segment 0xfd800000[5349376] of size 0 in core dbx: internal warning: writable memory segment 0xfdf30000[32768] of size 0 in core dbx: internal warning: writable memory segment 0xfdf40000[483328] of size 0 in core dbx: internal warning: writable memory segment 0xfe370000[24576] of size 0 in core core file header read successfully Reading ld.so.1 Reading libldap_r-2.4.so.2.0.4 Reading liblber-2.4.so.2.0.4 Reading libltdl.so.3.1.5 Reading libdb-4.6.so Reading librt.so.1 Reading libpthread.so.1 Reading libicuuc.so.2 Reading libicudata.so.2 Reading libsasl2.so.2.0.22 Reading libdl.so.1 Reading libssl.so.0.9.8 Reading libcrypto.so.0.9.8 Reading libresolv.so.2 Reading libgen.so.1 Reading libnsl.so.1 Reading libsocket.so.1 Reading libc.so.1 Reading libgcc_s.so.1 Reading libgcc_s.so.1 Reading libaio.so.1 Reading libmd5.so.1 Reading libm.so.2 Reading libCrun.so.1 Reading libc_psr.so.1 Reading libgssapiv2.so.2.0.22 Reading libgssapi.so.4.0.0 Reading libkrb5.so.17.4.0 Reading libasn1.so.6.1.0 Reading libroken.so.16.1.0 Reading libcom_err.so.1.1.3 Reading libncurses.so.5.4 Reading libdoor.so.1 Reading libscf.so.1 Reading libuutil.so.1 Reading libmd5_psr.so.1 Reading libmp.so.2 Reading libplain.so.2.0.22 Reading liblogin.so.2.0.22 Reading syncprov-2.4.so.2.0.4 t@1 (l@1) terminated by signal KILL (Killed) 0xfe4bd61c: __lwp_wait+0x0008: bcc,a,pt %icc,__lwp_wait+0x18 ! 0xfe4bd62c (dbx) threads
t@1 a l@1 ?() LWP suspended in __lwp_wait()
t@3 a l@3 ?() LWP suspended in __pollsys() t@4 a l@4 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@5 a l@5 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@6 a l@6 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@7 a l@7 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@8 a l@8 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@9 a l@9 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@10 a l@10 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@11 a l@11 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@12 a l@12 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@13 a l@13 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@14 a l@14 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@15 a l@15 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() t@16 a l@16 ldap_int_thread_pool_wrapper() LWP suspended in __lock_get_internal() t@17 a l@17 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() o t@18 a l@18 ldap_int_thread_pool_wrapper() signal SIGSEGV in match_re_C() t@19 a l@19 ldap_int_thread_pool_wrapper() sleep on 0x1dad38 in __lwp_park() (dbx) thread t@18 Current function is ldap_int_thread_pool_wrapper 625 task->ltt_start_routine(&ctx, task->ltt_arg); t@18 (l@18) stopped in match_re_C at 0xfe479f60 0xfe479f60: match_re_C+0x0b50: ldub [%i1], %l6 (dbx) where current thread: t@18 [1] match_re_C(0x1f697b, 0x0, 0x1fa278, 0xe5f3dfac, 0xe5f3df9c, 0x73), at 0xfe479f60 [2] __regexec_C(0xfe4ea344, 0x1fa278, 0x0, 0x64, 0xe5f3e910, 0x0), at 0xfe478ecc [3] slap_access_allowed(0x475ea18, 0x263544, 0xe5f3e06c, 0x15c800, 0x3, 0x1d76d0), at 0x59428 [4] fe_access_allowed(0x475ea18, 0x263544, 0x1d8290, 0x0, 0x5, 0x0), at 0x5bc00 [5] access_allowed_mask(0x475ea18, 0x263544, 0x1d8290, 0x0, 0x5, 0x0), at 0x57048 [6] slap_send_search_entry(0x8000, 0xe5fffcb8, 0xe5f3f448, 0x0, 0x5, 0x15c800), at 0x49174 [7] hdb_search(0x475ea18, 0xe5fffcb8, 0x0, 0xfff3ffd8, 0xfff3fc00, 0x163000), at 0xb0fd4 [8] overlay_op_walk(0x8000, 0xe5fffcb8, 0x8000, 0x15c540, 0x8000, 0xe5fff838), at 0x95a7c [9] 0x95be4(0x475ea18, 0xe5fffcb8, 0x2, 0xa7, 0x95c28, 0x1f48e8), at 0x95be3 [10] fe_op_search(0x475ea18, 0xe5fffcb8, 0x475eb14, 0xe5fffa38, 0x163438, 0x163528), at 0x3ae68 [11] do_search(0x475ea18, 0xe5fffcb8, 0xfe4e8bc0, 0x15c800, 0x123c00, 0xe5fffa38), at 0x3a5e0 [12] 0x38b20(0xe5fffe08, 0x475ea18, 0xfe4e8bc0, 0xfdec3800, 0x135a360, 0x0), at 0x38b1f [13] 0x393c4(0x0, 0x63, 0xfe4e8bc0, 0xfdec3800, 0x1dad28, 0x0), at 0x393c3 =>[14] ldap_int_thread_pool_wrapper(xpool = 0x1dad18), line 625 in "tpool.c"
Pstack of thread 18 on slave 2
----------------- lwp# 18 / thread# 18 -------------------- fe479f60 match_re_C (1f697b, 0, 1fa278, e5f3dfac, e5f3df9c, 73) + b50 fe478ecc __regexec_C (fe4ea344, 1fa278, 0, 64, e5f3e910, 0) + 16c 00059428 slap_access_allowed (475ea18, 263544, e5f3e06c, 15c800, 3, 1d76d0) + 2a0 0005bc00 fe_access_allowed (475ea18, 263544, 1d8290, 0, 5, 0) + 54 00057048 access_allowed_mask (475ea18, 263544, 1d8290, 0, 5, 0) + 17c 00049174 slap_send_search_entry (8000, e5fffcb8, e5f3f448, 0, 5, 15c800) + 158 000b0fd4 hdb_search (475ea18, e5fffcb8, 0, fff3ffd8, fff3fc00, 163000) + 22b0 00095a7c overlay_op_walk (8000, e5fffcb8, 8000, 15c540, 8000, e5fff838) + c8 00095be4 ???????? (475ea18, e5fffcb8, 2, a7, 95c28, 1f48e8) 0003ae68 fe_op_search (475ea18, e5fffcb8, 475eb14, e5fffa38, 163438, 163528) + 3a0 0003a5e0 do_search (475ea18, e5fffcb8, fe4e8bc0, 15c800, 123c00, e5fffa38) + 58c 00038b20 ???????? (e5fffe08, 475ea18, fe4e8bc0, fdec3800, 135a360, 0) 000393c4 ???????? (0, 63, fe4e8bc0, fdec3800, 1dad28, 0) ff34d89c ldap_int_thread_pool_wrapper (1dad18, e6000000, 0, 0, 0, 0) + 1ec fe4bc400 _lwp_start (0, 0, 0, 0, 0, 0)