Hello,
I've seen a couple of instances where slapd becomes unresponsive, apparently because the threads are waiting on a backend meta DB. We're running slapd 2.4.23 on Solaris 10 (update 11/06). We have 128 threads configured and when I attach with truss, I see 130 allocated, most of which look like this:
/53: lwp_park(0x00000000, 0) (sleeping...)
When I run pstack against the PID, the stack for almost all of the threads looks like one of the two threads below, both of which are somewhere within meta_back_search:
----------------- lwp# 20 / thread# 20 -------------------- fee40408 lwp_park (0, 0, 0) 00140e64 ldap_build_search_req (110f0c0, 599eb90, 2, 79db010, 79db0a0, 0) + 2c 001412bc ldap_pvt_search (110f0c0, 599eb90, 2, 79db010, 79db0a0, 0) + d4 000cd494 ???????? (3f45d70, f4fffd58, f4fff478, f4fff294, 0, 2b822c0) 000cd7fc meta_back_search (3f45d70, f4fffd58, 2, 2, 24d400, 0) + 1e0 000a0928 ???????? (3f45d70, f4fffd58, f4fff8e0, 28ff20, 2ac1b8, 14) 000a12dc ???????? (f4fff8e0, f4fffd58, 2, 2, 24d400, 28ff20) 000a3950 overlay_op_walk (8000, f4fffd58, 8000, 28fe18, 28ff20, 818) + 4c 000a3ae4 ???????? (3f45d70, f4fffd58, 2, 1e6000, a3ba0, 28fe18) 00041e08 fe_op_search (3f45d70, f4fffd58, 3f45e70, f4fffad8, 1ee5f8, 1ee6f0) + 3f8 00041528 do_search (3f45d70, f4fffd58, fee6cbc0, 1e6000, 16ec00, f4fffad8) + 590 0003f8e4 ???????? (f4fffe08, 3f45d70, fee6cbc0, fe2d4400, 2683c8, 0) 0013ca30 ???????? (2683b8, f5000000, 0, 0, 13c8d4, 1) fee40368 _lwp_start (0, 0, 0, 0, 0, 0)
----------------- lwp# 21 / thread# 21 -------------------- fee40408 lwp_park (0, 0, 0) 0013e5c8 ldap_result (10ea3b0, 43e, 2, f47ff488, f47ff290, 0) + 3c 000cddf0 meta_back_search (53fe5b8, f47ffd58, 2, 2, 0, 1) + 7d4 000a0928 ???????? (53fe5b8, f47ffd58, f47ff8e0, 28ff20, 2ac1b8, 14) 000a12dc ???????? (f47ff8e0, f47ffd58, 2, 2, 24d400, 28ff20) 000a3950 overlay_op_walk (8000, f47ffd58, 8000, 28fe18, 28ff20, 818) + 4c 000a3ae4 ???????? (53fe5b8, f47ffd58, 2, 1e6000, a3ba0, 28fe18) 00041e08 fe_op_search (53fe5b8, f47ffd58, 53fe6b8, f47ffad8, 1ee5f8, 1ee6f0) + 3f8 00041528 do_search (53fe5b8, f47ffd58, fee6cbc0, 1e6000, 16ec00, f47ffad8) + 590 0003f8e4 ???????? (f47ffe08, 53fe5b8, fee6cbc0, fe2d4800, 2683c8, 0) 0013ca30 ???????? (2683b8, f4800000, 0, 0, 13c8d4, 1) fee40368 _lwp_start (0, 0, 0, 0, 0, 0)
I took a core dump while the process was running but I'm not really sure how to proceed from here. Is there any way to get more information on what was happening with these threads at the time? In either case, is this a situation that should be handled with a general timeout directive? We currently only have a network-timeout and a bind timeout specified:
network-timeout 3 timeout bind=3
Thanks, Lincoln
--On Monday, December 05, 2011 6:09 PM +0100 Lincoln Souzek lsouzek@gmail.com wrote:
Hello,
I've seen a couple of instances where slapd becomes unresponsive, apparently because the threads are waiting on a backend meta DB. We're running slapd 2.4.23 on Solaris 10 (update 11/06). We have 128 threads configured and when I attach with truss, I see 130 allocated, most of which look like this:
I would note there have been some re-entry fixes for back-meta since 2.4.23 was released. I would suggest you re-try with the latest OpenLDAP version as a first step.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
openldap-technical@openldap.org