Richard Silverman wrote:
On Thu, 20 Dec 2012, Howard Chu wrote:
Please post the entire stack trace for all threads. Since we have c_writers = 8, c_writing = 1, c_writewaiter = 1
then one of the threads ought to be blocked on the write2_cv (result.c:373). That implies that they're still waiting for epoll() to say that the socket is writable.
Sure; here is a dump of the stacks of all 18 executing threads:
[Switching to thread 2 (Thread 0x7f21de7fc700 (LWP 8482))]#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 162 62: movl (%rsp), %edi #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x000000000056926e in ldap_int_thread_cond_wait (cond=0x7f2210697720, mutex=0x7f22106976d0) at thr_posix.c:277 #2 0x000000000056b1c2 in ldap_pvt_thread_cond_wait (cond=0x7f2210697720, mutex=0x7f22106976d0) at thr_debug.c:954 #3 0x000000000043e487 in send_ldap_ber (op=0x7f21e0140260, ber=0x7f21de67a460) at result.c:309 #4 0x000000000044160b in slap_send_search_entry (op=0x7f21e0140260, rs=0x7f21de7fb920) at result.c:1435 #5 0x00000000004c5c27 in bdb_search (op=0x7f21e0140260, rs=0x7f21de7fb920) at search.c:1014 #6 0x000000000042dd6a in fe_op_search (op=0x7f21e0140260, rs=0x7f21de7fb920) at search.c:402 #7 0x000000000042d5ae in do_search (op=0x7f21e0140260, rs=0x7f21de7fb920) at search.c:247 #8 0x000000000042a776 in connection_operation (ctx=0x7f21de7fba00, arg_v=0x7f21e0140260) at connection.c:1218 #9 0x0000000000567cc7 in ldap_int_thread_pool_wrapper (xpool=0x17f0e90) at tpool.c:688 #10 0x000000000056a767 in ldap_debug_thread_wrapper (arg=0x7f21ec136ad0) at thr_debug.c:770 #11 0x00000039e3407851 in start_thread (arg=0x7f21de7fc700) at pthread_create.c:301 #12 0x00000039e2ce811d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
OK, most of the threads look like thread 2.
Thread 13 and thread 16 are waiting for their respective sockets to be writable.
[Switching to thread 13 (Thread 0x7f21f27fc700 (LWP 8470))]#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 162 62: movl (%rsp), %edi #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x000000000056926e in ldap_int_thread_cond_wait (cond=0x7f221069b9a8, mutex=0x7f221069b958) at thr_posix.c:277 #2 0x000000000056b1c2 in ldap_pvt_thread_cond_wait (cond=0x7f221069b9a8, mutex=0x7f221069b958) at thr_debug.c:954 #3 0x000000000043e716 in send_ldap_ber (op=0x7f21cc191420, ber=0x7f21f267a460) at result.c:379 #4 0x000000000044160b in slap_send_search_entry (op=0x7f21cc191420, rs=0x7f21f27fb920) at result.c:1435 #5 0x00000000004c5c27 in bdb_search (op=0x7f21cc191420, rs=0x7f21f27fb920) at search.c:1014 #6 0x000000000042dd6a in fe_op_search (op=0x7f21cc191420, rs=0x7f21f27fb920) at search.c:402 #7 0x000000000042d5ae in do_search (op=0x7f21cc191420, rs=0x7f21f27fb920) at search.c:247 #8 0x000000000042a776 in connection_operation (ctx=0x7f21f27fba00, arg_v=0x7f21cc191420) at connection.c:1218 #9 0x0000000000567cc7 in ldap_int_thread_pool_wrapper (xpool=0x17f0e90) at tpool.c:688 #10 0x000000000056a767 in ldap_debug_thread_wrapper (arg=0x7f21f4000f20) at thr_debug.c:770 #11 0x00000039e3407851 in start_thread (arg=0x7f21f27fc700) at pthread_create.c:301 #12 0x00000039e2ce811d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
[Switching to thread 16 (Thread 0x7f21f3fff700 (LWP 8467))]#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 162 62: movl (%rsp), %edi #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x000000000056926e in ldap_int_thread_cond_wait (cond=0x7f22106977c0, mutex=0x7f2210697770) at thr_posix.c:277 #2 0x000000000056b1c2 in ldap_pvt_thread_cond_wait (cond=0x7f22106977c0, mutex=0x7f2210697770) at thr_debug.c:954 #3 0x000000000043e716 in send_ldap_ber (op=0x7f21c8111140, ber=0x7f21f3e7d460) at result.c:379 #4 0x000000000044160b in slap_send_search_entry (op=0x7f21c8111140, rs=0x7f21f3ffe920) at result.c:1435 #5 0x00000000004c5c27 in bdb_search (op=0x7f21c8111140, rs=0x7f21f3ffe920) at search.c:1014 #6 0x000000000042dd6a in fe_op_search (op=0x7f21c8111140, rs=0x7f21f3ffe920) at search.c:402 #7 0x000000000042d5ae in do_search (op=0x7f21c8111140, rs=0x7f21f3ffe920) at search.c:247 #8 0x000000000042a776 in connection_operation (ctx=0x7f21f3ffea00, arg_v=0x7f21c8111140) at connection.c:1218 #9 0x0000000000567cc7 in ldap_int_thread_pool_wrapper (xpool=0x17f0e90) at tpool.c:688 #10 0x000000000056a767 in ldap_debug_thread_wrapper (arg=0x7f21f4000a60) at thr_debug.c:770 #11 0x00000039e3407851 in start_thread (arg=0x7f21f3fff700) at pthread_create.c:301 #12 0x00000039e2ce811d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
So the question is why are these threads still waiting on their sockets.
Looking at your logs, it's clear that the error occurs and connection_closing() is called, which will call connection_abandon() to set the o_abandon flag on all of the connection's ops.
Can you please provide the output for "print *op" and "print *op->o_hdr" in thread 13 and thread 16, frame 3?