There is a discussion of solaris threads socket read/write locking
issues and some workarounds at:
http://omniorb.sourceforge.net/omni40/omnithread.html
See "6 Threaded I/O shutdown for Unix"
Cheers
Brett
On 08/01/2011, at 9:55 AM, Howard Chu <hyc(a)symas.com> wrote:
> Doug Leavitt wrote:
>>
>> On 01/ 7/11 08:01 AM, Rein Tollevik wrote:
>>> On 06.01.11 22.48, Quanah Gibson-Mount wrote:
>>>> --On Thursday, January 06, 2011 7:40 PM +0100 Rein Tollevik
>>>> <rein(a)OpenLDAP.org> wrote:
>>>>
>>>>> On 04.01.11 23.34, Quanah Gibson-Mount wrote:
>>>>>> Please test RE24 heavily.
>>>>>
>>>>> test039 deadlocks for me on 64bit solaris10, both x86 and sparc :-(
It
>>>>> hangs in the monitor, triggered by the new swamp -SS option added to
>>>>> slapd-tester. It works if run with -S or -SSS. It is the third
server
>>>>> that hangs, and it does so quite consistently with the same stack
trace
>>>>> every time. A gdb trace is at at:
>>>>>
>>>>>
ftp://ftp.openldap.org/incoming/rein-test039-gdb-trace.txt
>>>>
>>>> Does this happen on both HEAD and RE24, or RE24 only?
>>>
>>> Both, as well as when running the head tests suite with the 2.4.23
>>> release. Looks as if the swamp additions have tripped into an
>>> existing problem, not anything new. Leave it out of RE24 until if
>>> have been resolved?
>>>
>>> Btw, any other Solaris test runs out there? I´t like to know if it is
>>> a real Solaris problem or just me..
>
> I'm seeing a similar failure on 32 bit Sparc Solaris 10. But it actually locks up
in test036 for me, I never get as far as test039. The gdb trace looks much the same as
what you posted.
>
> Looks like for some reason threads that are blocked waiting for their sockets to
become writable are never getting waken up. A regular SIGINT shuts down slapd cleanly so
it doesn't appear to be a problem with the condvars being used to manage the threads.
That kinda points to select() simply not returning the writable status.
>
> I haven't used this Solaris machine much, but in fact (looking at the remnants of
other files in my source tree on this box) this appears to have been a problem since at
least last August. (I.e., it looks like I was investigating this same problem back then
but dropped it and never got back to it.)
>
>>> Rein
>
>> I'm currently testing Solaris11 (Nevada) and not seeing any issues in
>> either 32 or 64
>> bit builds using both RE24 and HEAD. I have not had any failures on
>> x86 yet.
>> Testing is still underway for sparc and other internal system testing on
>> both platforms.
>
> --
> -- Howard Chu
> CTO, Symas Corp.
http://www.symas.com
> Director, Highland Sun
http://highlandsun.com/hyc/
> Chief Architect, OpenLDAP
http://www.openldap.org/project/