Richard Silverman wrote:
On Thu, 20 Dec 2012, Howard Chu wrote:
> Please post the entire stack trace for all threads. Since we have
> c_writers = 8, c_writing = 1, c_writewaiter = 1
>
> then one of the threads ought to be blocked on the write2_cv (result.c:373).
> That implies that they're still waiting for epoll() to say that the socket is
> writable.
Sure; here is a dump of the stacks of all 18 executing threads:
[Switching to thread 2 (Thread 0x7f21de7fc700 (LWP 8482))]#0
pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
162 62: movl (%rsp), %edi
#0 pthread_cond_wait@@GLIBC_2.3.2 () at
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x000000000056926e in ldap_int_thread_cond_wait (cond=0x7f2210697720,
mutex=0x7f22106976d0) at thr_posix.c:277
#2 0x000000000056b1c2 in ldap_pvt_thread_cond_wait (cond=0x7f2210697720,
mutex=0x7f22106976d0) at thr_debug.c:954
#3 0x000000000043e487 in send_ldap_ber (op=0x7f21e0140260, ber=0x7f21de67a460) at
result.c:309
#4 0x000000000044160b in slap_send_search_entry (op=0x7f21e0140260, rs=0x7f21de7fb920)
at result.c:1435
#5 0x00000000004c5c27 in bdb_search (op=0x7f21e0140260, rs=0x7f21de7fb920) at
search.c:1014
#6 0x000000000042dd6a in fe_op_search (op=0x7f21e0140260, rs=0x7f21de7fb920) at
search.c:402
#7 0x000000000042d5ae in do_search (op=0x7f21e0140260, rs=0x7f21de7fb920) at
search.c:247
#8 0x000000000042a776 in connection_operation (ctx=0x7f21de7fba00, arg_v=0x7f21e0140260)
at connection.c:1218
#9 0x0000000000567cc7 in ldap_int_thread_pool_wrapper (xpool=0x17f0e90) at tpool.c:688
#10 0x000000000056a767 in ldap_debug_thread_wrapper (arg=0x7f21ec136ad0) at
thr_debug.c:770
#11 0x00000039e3407851 in start_thread (arg=0x7f21de7fc700) at pthread_create.c:301
#12 0x00000039e2ce811d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
OK, most of the threads look like thread 2.
Thread 13 and thread 16 are waiting for their respective sockets to be writable.
[Switching to thread 13 (Thread 0x7f21f27fc700 (LWP 8470))]#0
pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
162 62: movl (%rsp), %edi
#0 pthread_cond_wait@@GLIBC_2.3.2 () at
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x000000000056926e in ldap_int_thread_cond_wait (cond=0x7f221069b9a8,
mutex=0x7f221069b958) at thr_posix.c:277
#2 0x000000000056b1c2 in ldap_pvt_thread_cond_wait (cond=0x7f221069b9a8,
mutex=0x7f221069b958) at thr_debug.c:954
#3 0x000000000043e716 in send_ldap_ber (op=0x7f21cc191420, ber=0x7f21f267a460) at
result.c:379
#4 0x000000000044160b in slap_send_search_entry (op=0x7f21cc191420, rs=0x7f21f27fb920)
at result.c:1435
#5 0x00000000004c5c27 in bdb_search (op=0x7f21cc191420, rs=0x7f21f27fb920) at
search.c:1014
#6 0x000000000042dd6a in fe_op_search (op=0x7f21cc191420, rs=0x7f21f27fb920) at
search.c:402
#7 0x000000000042d5ae in do_search (op=0x7f21cc191420, rs=0x7f21f27fb920) at
search.c:247
#8 0x000000000042a776 in connection_operation (ctx=0x7f21f27fba00, arg_v=0x7f21cc191420)
at connection.c:1218
#9 0x0000000000567cc7 in ldap_int_thread_pool_wrapper (xpool=0x17f0e90) at tpool.c:688
#10 0x000000000056a767 in ldap_debug_thread_wrapper (arg=0x7f21f4000f20) at
thr_debug.c:770
#11 0x00000039e3407851 in start_thread (arg=0x7f21f27fc700) at pthread_create.c:301
#12 0x00000039e2ce811d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
[Switching to thread 16 (Thread 0x7f21f3fff700 (LWP 8467))]#0
pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
162 62: movl (%rsp), %edi
#0 pthread_cond_wait@@GLIBC_2.3.2 () at
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x000000000056926e in ldap_int_thread_cond_wait (cond=0x7f22106977c0,
mutex=0x7f2210697770) at thr_posix.c:277
#2 0x000000000056b1c2 in ldap_pvt_thread_cond_wait (cond=0x7f22106977c0,
mutex=0x7f2210697770) at thr_debug.c:954
#3 0x000000000043e716 in send_ldap_ber (op=0x7f21c8111140, ber=0x7f21f3e7d460) at
result.c:379
#4 0x000000000044160b in slap_send_search_entry (op=0x7f21c8111140, rs=0x7f21f3ffe920)
at result.c:1435
#5 0x00000000004c5c27 in bdb_search (op=0x7f21c8111140, rs=0x7f21f3ffe920) at
search.c:1014
#6 0x000000000042dd6a in fe_op_search (op=0x7f21c8111140, rs=0x7f21f3ffe920) at
search.c:402
#7 0x000000000042d5ae in do_search (op=0x7f21c8111140, rs=0x7f21f3ffe920) at
search.c:247
#8 0x000000000042a776 in connection_operation (ctx=0x7f21f3ffea00, arg_v=0x7f21c8111140)
at connection.c:1218
#9 0x0000000000567cc7 in ldap_int_thread_pool_wrapper (xpool=0x17f0e90) at tpool.c:688
#10 0x000000000056a767 in ldap_debug_thread_wrapper (arg=0x7f21f4000a60) at
thr_debug.c:770
#11 0x00000039e3407851 in start_thread (arg=0x7f21f3fff700) at pthread_create.c:301
#12 0x00000039e2ce811d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
So the question is why are these threads still waiting on their sockets.
Looking at your logs, it's clear that the error occurs and
connection_closing() is called, which will call connection_abandon() to set
the o_abandon flag on all of the connection's ops.
Can you please provide the output for "print *op" and "print
*op->o_hdr"
in thread 13 and thread 16, frame 3?
--
-- Howard Chu
CTO, Symas Corp.
http://www.symas.com
Director, Highland Sun
http://highlandsun.com/hyc/
Chief Architect, OpenLDAP
http://www.openldap.org/project/