--0016364c7ac7fff89504b31ea350 Content-Type: text/plain; charset=ISO-8859-1
Quanah,
Were you able to recreate this issue?
Soichi
On Wed, Nov 16, 2011 at 3:56 PM, Soichi Hayashi hayashis@indiana.eduwrote:
Quanah,
We have compiled OpenLDAP 2.4.26 with BDB 5.2.36. The OpenLDAP locked up 4 hours into our testing in similar manner to what I have reported earlier. I believe this issue still occurs on the latest version.
However, when I used gdb, I didn't notice the mutex locked threads like I did with OpenLDAP 2.4.22.
Following is from locked 2.4.26 slapd server.
(gdb) info thread 14 Thread 0x418dd940 (LWP 13814) 0x00000037aa4d48a8 in epoll_wait () from /lib64/libc.so.6 13 Thread 0x420de940 (LWP 13815) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 12 Thread 0x428df940 (LWP 13816) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 11 Thread 0x430e0940 (LWP 13843) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 10 Thread 0x438e1940 (LWP 13855) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 9 Thread 0x440e2940 (LWP 13856) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 8 Thread 0x448e3940 (LWP 13857) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 7 Thread 0x450e4940 (LWP 13858) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 6 Thread 0x458e5940 (LWP 13859) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 5 Thread 0x460e6940 (LWP 13860) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 4 Thread 0x468e7940 (LWP 2007) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 3 Thread 0x470e8940 (LWP 2008) 0x00000037aa4cd722 in select () from /lib64/libc.so.6 2 Thread 0x478e9940 (LWP 2009) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
- 1 Thread 0x2ac6ccfdc930 (LWP 13805) 0x00000037aac07b35 in pthread_join
() from /lib64/libpthread.so.0 (gdb) thread 3 [Switching to thread 3 (Thread 0x470e8940 (LWP 2008))]#0 0x00000037aa4cd722 in select () from /lib64/libc.so.6 (gdb) bt #0 0x00000037aa4cd722 in select () from /lib64/libc.so.6 #1 0x000000000054ece5 in ?? () #2 0x000000000054aa15 in ?? () #3 0x0000000000557637 in ?? () #4 0x0000000000557c11 in ?? () #5 0x00000000004b2d93 in ?? () #6 0x00000000004e9d7c in ?? () #7 0x00000037aac0673d in start_thread () from /lib64/libpthread.so.0 #8 0x00000037aa4d44bd in clone () from /lib64/libc.so.6
It looks like it's waiting on select() on thread 3 which never get fired when I access it using ldapsearch command.
I ran strace on ldapsearch (on a client machine) and following is what I see at the end of the log..
$ strace ldapsearch -h 129.79.14.152 -p 2180 -l 3 -x -b mds-vo-name=WT2,o=grid "(&(objectClass=GlueLocation)(GlueLocationName=TIMESTAMP))"
.... write(1, "\n", 1 ) = 1 write(3, "0l\2\1\2cg\4\26mds-vo-name=WT2,o=grid\n"..., 110) = 110 poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, -1
Not sure if this strace is useful or not.. but after this, ldapsearch never returned.
Thanks, Soichi
On Wed, Nov 9, 2011 at 1:13 PM, Quanah Gibson-Mount quanah@zimbra.comwrote:
--On Wednesday, November 09, 2011 2:01 PM +0000 hayashis@indiana.eduwrote:
Full_Name: Soichi Hayashi
Version: 2.4.22
OpenLDAP 2.4.22 is quite old, and had various known issues. Please use a current release (2.4.26). This report will not be investigated unless you can reproduce it with a current release of OpenLDAP. You also fail to note what BDB release you are using, and whether or not it has all the relevant patches applied to it. If you have a broken policy of only using vendor provided packages, then you will need to send a bug report to RedHat, as it is their job to maintain their vendor packages.
Thanks!
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc.
Zimbra :: the leader in open source messaging and collaboration
--0016364c7ac7fff89504b31ea350 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Quanah,<div><br></div><div>Were you able to recreate this issue?</div><div>= <br></div><div>Soichi<br><div><br><div class=3D"gmail_quote">On Wed, Nov 16= , 2011 at 3:56 PM, Soichi Hayashi <span dir=3D"ltr"><<a href=3D"mailto:h= ayashis@indiana.edu">hayashis@indiana.edu</a>></span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex;"><div>Quanah,</div><div><br></div><div>We ha= ve compiled OpenLDAP 2.4.26 with BDB 5.2.36. The OpenLDAP locked up 4 hours= into our testing in similar manner to what I have reported earlier. I beli= eve this issue still occurs on the latest version.</div>
<div><br></div><div>However, when I used gdb, I didn't notice the mutex= locked threads like I did with OpenLDAP 2.4.22.</div><div><br></div><div>F= ollowing is from locked 2.4.26 slapd server.</div><div><br></div><div> (gdb) info thread</div> <div>=A0 14 Thread 0x418dd940 (LWP 13814) =A00x00000037aa4d48a8 in epoll_wa= it () from /lib64/libc.so.6</div><div>=A0 13 Thread 0x420de940 (LWP 13815) = =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libp= thread.so.0</div>
<div>=A0 12 Thread 0x428df940 (LWP 13816) =A00x00000037aac0aee9 in pthread_= cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 11 Thre= ad 0x430e0940 (LWP 13843) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC= _2.3.2 () from /lib64/libpthread.so.0</div>
<div>=A0 10 Thread 0x438e1940 (LWP 13855) =A00x00000037aac0aee9 in pthread_= cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 9 Threa= d 0x440e2940 (LWP 13856) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_= 2.3.2 () from /lib64/libpthread.so.0</div>
<div>=A0 8 Thread 0x448e3940 (LWP 13857) =A00x00000037aac0aee9 in pthread_c= ond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 7 Thread= 0x450e4940 (LWP 13858) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2= .3.2 () from /lib64/libpthread.so.0</div>
<div>=A0 6 Thread 0x458e5940 (LWP 13859) =A00x00000037aac0aee9 in pthread_c= ond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 5 Thread= 0x460e6940 (LWP 13860) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2= .3.2 () from /lib64/libpthread.so.0</div>
<div>=A0 4 Thread 0x468e7940 (LWP 2007) =A00x00000037aac0aee9 in pthread_co= nd_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 3 Thread = 0x470e8940 (LWP 2008) =A00x00000037aa4cd722 in select () from /lib64/libc.s= o.6</div>
<div>=A0 2 Thread 0x478e9940 (LWP 2009) =A00x00000037aac0aee9 in pthread_co= nd_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>* 1 Thread 0x= 2ac6ccfdc930 (LWP 13805) =A00x00000037aac07b35 in pthread_join () from /lib= 64/libpthread.so.0</div>
<div>(gdb) thread 3</div><div>[Switching to thread 3 (Thread 0x470e8940 (LW= P 2008))]#0 =A00x00000037aa4cd722 in select () from /lib64/libc.so.6</div><= div>(gdb) bt</div><div>#0 =A00x00000037aa4cd722 in select () from /lib64/li= bc.so.6</div>
<div>#1 =A00x000000000054ece5 in ?? ()</div><div>#2 =A00x000000000054aa15 i= n ?? ()</div><div>#3 =A00x0000000000557637 in ?? ()</div><div>#4 =A00x00000= 00000557c11 in ?? ()</div><div>#5 =A00x00000000004b2d93 in ?? ()</div><div>= #6 =A00x00000000004e9d7c in ?? ()</div>
<div>#7 =A00x00000037aac0673d in start_thread () from /lib64/libpthread.so.= 0</div><div>#8 =A00x00000037aa4d44bd in clone () from /lib64/libc.so.6</div=
<div><br></div><div>It looks like it's waiting on select() on thread 3=
which never get fired when I access it using ldapsearch command.=A0</div>
<div><br></div><div>I ran strace on ldapsearch (on a client machine) and fo= llowing is what I see at the end of the log..</div><div><br></div><div>$ st= race ldapsearch -h 129.79.14.152 -p 2180 -l 3 -x -b mds-vo-name=3DWT2,o=3Dg= rid "(&(objectClass=3DGlueLocation)(GlueLocationName=3DTIMESTAMP))= "</div>
<div><br></div><div>....</div><div>write(1, "\n", 1</div><div>) = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D 1</div><div>write(3, "= 0l\2\1\2cg\4\26mds-vo-name=3DWT2,o=3Dgrid\n"..., 110) =3D 110</div><di= v>poll([{fd=3D3, events=3DPOLLIN|POLLPRI|POLLERR|POLLHUP}], 1, -1</div>
<div><br></div><div>Not sure if this strace is useful or not.. but after th= is, ldapsearch never returned.</div><div><br></div><div>Thanks,</div><div>S= oichi</div><div class=3D"HOEnZb"><div class=3D"h5"><div><br></div><br><div = class=3D"gmail_quote"> On Wed, Nov 9, 2011 at 1:13 PM, Quanah Gibson-Mount <span dir=3D"ltr"><<= a href=3D"mailto:quanah@zimbra.com" target=3D"_blank">quanah@zimbra.com</a>= ></span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex">--On Wednesday, November 09, 2011 2:01 PM +0= 000 <a href=3D"mailto:hayashis@indiana.edu" target=3D"_blank">hayashis@indi= ana.edu</a> wrote:<br>
<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> Full_Name: Soichi Hayashi<br> Version: 2.4.22<br> </blockquote> <br> OpenLDAP 2.4.22 is quite old, and had various known issues. =A0Please use a= current release (2.4.26). =A0This report will not be investigated unless y= ou can reproduce it with a current release of OpenLDAP. =A0You also fail to= note what BDB release you are using, and whether or not it has all the rel= evant patches applied to it. =A0If you have a broken policy of only using v= endor provided packages, then you will need to send a bug report to RedHat,= as it is their job to maintain their vendor packages.<br>
<br> <br> Thanks!<span><font color=3D"#888888"><br> <br> --Quanah<br> <br> --<br> <br> Quanah Gibson-Mount<br> Sr. Member of Technical Staff<br> Zimbra, Inc<br> A Division of VMware, Inc.<br> --------------------<br> Zimbra :: =A0the leader in open source messaging and collaboration<br> </font></span></blockquote></div><br> </div></div></blockquote></div><br></div></div>
--0016364c7ac7fff89504b31ea350--