RE24 testing call #1 (OL 2.4.24)

List overview All Threads
Download

newer

older

librewrite & co vs. logging &...

Re: commit: ldap/libraries/libldap...

Quanah Gibson-Mount

4 Jan 2011 4 Jan '11

11:34 p.m.

Please test RE24 heavily.

Thanks!

--Quanah

Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration

Show replies by date

Doug Leavitt

5 Jan 5 Jan

12:34 a.m.

It looks like parts, but not all, of the thread safe commit made it into the OPENLDAP_REL_ENG_2_4 tree. (ITS #6625)

Specifically the ldap_dup and ldap_destroy APIs are missing as well as some of the locking changes needed to support multi-thread safe function calls.

Was that intentional or something that still needs integration?

I'm asking because these were fixes critical to us being able to integrate OpenLDAP into the upcoming Solaris 11 release.

Thanks in advance for your consideration, Doug Leavitt

On 01/ 4/11 04:34 PM, Quanah Gibson-Mount wrote:

...

Please test RE24 heavily.

Thanks!

--Quanah

--

Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc.

Zimbra :: the leader in open source messaging and collaboration

Jens Vagelpohl

9:32 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 1/4/11 23:34 , Quanah Gibson-Mount wrote:

...

Please test RE24 heavily.

No errors on OS X 10.6.5 with -march=x86-64 and BDB 4.7.52 with patches.

jens

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iEYEARECAAYFAk0kLLYACgkQRAx5nvEhZLIsxACeMaAV7f/j627u+KILaWTVX+Vr iNMAnjW1uEOpa6RDhXwDMWIlgkZXxRct =1rpB -----END PGP SIGNATURE-----

Dieter Kluenter

9:06 p.m.

Am Tue, 04 Jan 2011 14:34:46 -0800 schrieb Quanah Gibson-Mount quanah@zimbra.com:

...

Please test RE24 heavily.

OpenSUSE-11.3-x86_64, exception libd-4.8 50 test loops all OK

-Dieter

-- Dieter Klünter | Systemberatung http://dkluenter.de GPG Key ID:DA147B05 53°37'09,95"N 10°08'02,42"E

Gavin Henry

6 Jan 6 Jan

10:20 a.m.

All fine here (386 though)

----- Original Message -----

...

Please test RE24 heavily.

Thanks!

--Quanah

--

Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc.

Zimbra :: the leader in open source messaging and collaboration

-- Kind Regards, Gavin Henry. OpenLDAP Engineering Team. E ghenry@OpenLDAP.org Community developed LDAP software. http://www.openldap.org/project/

Jonathan CLARKE

12:58 p.m.

On 04/01/2011 23:34, Quanah Gibson-Mount wrote:

...

Please test RE24 heavily.

Thanks!

All tests passed on my i386 Ubuntu 10.04.

Jonathan

-- ========================================== Jonathan CLARKE ------------------------------------------ Normation 44 rue Cauchy, 94110 Arcueil, France ------------------------------------------ Telephone: +33 (0)1 83 62 41 24 ------------------------------------------ Web: http://www.normation.com/ ==========================================

Hallvard B Furuseth

2:38 p.m.

I'll second Doug's wish for the concurrency patch. Unless we're planning to have 2.4.25 out pretty soon anyway...

Also this is in Test: ITS#6736 Listener info destroyed too early on shutdown

Some issues it'd be nice to have in, if they're simple for those who know the code in question:

ITS#6760 rwm broken entry handling Fixed, I think, but needs review

ITS#6532 Support for common orderingMatching rules in extensible match filters TODO: Handle CSN and UUID ordering match too? (Almost certainly OK, but I wanted someone who knows to say so. It's a two-line update, setting the SLAP_MR_EXT flag for these rules.)

Also, possibly the rest of "ITS#6739 broken do_syncrep2()" is simple. Almost fixed.

-- Hallvard

Quanah Gibson-Mount

6:54 p.m.

--On Thursday, January 06, 2011 2:38 PM +0100 Hallvard B Furuseth h.b.furuseth@usit.uio.no wrote:

...

I'll second Doug's wish for the concurrency patch. Unless we're planning to have 2.4.25 out pretty soon anyway...

This was just a checkpoint to make sure what was done so far doesn't break anything. I.e., I went through and pulled in all the obvious bits since last July.

...

Also this is in Test: ITS#6736 Listener info destroyed too early on shutdown

This will be in the next set.

...

Some issues it'd be nice to have in, if they're simple for those who know the code in question:

ITS#6760 rwm broken entry handling Fixed, I think, but needs review

This will be in the next set.

...

ITS#6532 Support for common orderingMatching rules in extensible match filters TODO: Handle CSN and UUID ordering match too? (Almost certainly OK, but I wanted someone who knows to say so. It's a two-line update, setting the SLAP_MR_EXT flag for these rules.)

Also, possibly the rest of "ITS#6739 broken do_syncrep2()" is simple. Almost fixed.

If you get these committed, I can add them.

--Quanah

Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration

Hallvard B Furuseth

7 Jan 7 Jan

1:30 p.m.

Quanah Gibson-Mount writes:

...

This was just a checkpoint to make sure what was done so far doesn't break anything.

Ah, OK.

...

...
Some issues it'd be nice to have in, if they're simple for those who know the code in question:

ITS#6760 rwm broken entry handling Fixed, I think, but needs review

This will be in the next set.

Someone's OKed it? I copied/moved what some of the code was doing, but I don't know why it was doing it. (Noted in the ITS.)

...

...
ITS#6532 Support for common orderingMatching rules in extensible match filters TODO: Handle CSN and UUID ordering match too? (Almost certainly OK, but I wanted someone who knows to say so. It's a two-line update, setting the SLAP_MR_EXT flag for these rules.)

Also, possibly the rest of "ITS#6739 broken do_syncrep2()" is simple. Almost fixed.

If you get these committed, I can add them.

I've committed my part, but someone who knows syncrepl must do the rest.

-- Hallvard

Hallvard B Furuseth

10 Jan 10 Jan

2:52 p.m.

I wrote:

...

...
...
Also, possibly the rest of "ITS#6739 broken do_syncrep2()" is simple. Almost fixed.

If you get these committed, I can add them.

I've committed my part, but someone who knows syncrepl must do the rest.

OTOH... I've marked it Test, do import it. Just got reminded why I patched it, to fix this: CPPFLAGS=-DLDAP_THREAD_DEBUG ./configure ./run -b ldif test019 thr_debug fails because a thread unlocks another thread's mutex.

-- Hallvard

Rein Tollevik

6 Jan 6 Jan

7:27 p.m.

On 04.01.11 23.34, Quanah Gibson-Mount wrote:

...

Please test RE24 heavily.

Rein Tollevik

7:40 p.m.

On 04.01.11 23.34, Quanah Gibson-Mount wrote:

...

Please test RE24 heavily.

test039 deadlocks for me on 64bit solaris10, both x86 and sparc :-( It hangs in the monitor, triggered by the new swamp -SS option added to slapd-tester. It works if run with -S or -SSS. It is the third server that hangs, and it does so quite consistently with the same stack trace every time. A gdb trace is at at:

ftp://ftp.openldap.org/incoming/rein-test039-gdb-trace.txt

No problem on 64bit x86 redhat4.

Rein

PS: sorry for the empty message that my slippery fingers managed to send :-(

Quanah Gibson-Mount

10:48 p.m.

--On Thursday, January 06, 2011 7:40 PM +0100 Rein Tollevik rein@OpenLDAP.org wrote:

...

On 04.01.11 23.34, Quanah Gibson-Mount wrote:

...
Please test RE24 heavily.

test039 deadlocks for me on 64bit solaris10, both x86 and sparc :-( It hangs in the monitor, triggered by the new swamp -SS option added to slapd-tester. It works if run with -S or -SSS. It is the third server that hangs, and it does so quite consistently with the same stack trace every time. A gdb trace is at at:

ftp://ftp.openldap.org/incoming/rein-test039-gdb-trace.txt

No problem on 64bit x86 redhat4.

Does this happen on both HEAD and RE24, or RE24 only?

--Quanah

Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration

Rein Tollevik

7 Jan 7 Jan

3:01 p.m.

On 06.01.11 22.48, Quanah Gibson-Mount wrote:

...

--On Thursday, January 06, 2011 7:40 PM +0100 Rein Tollevik rein@OpenLDAP.org wrote:

...
On 04.01.11 23.34, Quanah Gibson-Mount wrote:

...
Please test RE24 heavily.

test039 deadlocks for me on 64bit solaris10, both x86 and sparc :-( It hangs in the monitor, triggered by the new swamp -SS option added to slapd-tester. It works if run with -S or -SSS. It is the third server that hangs, and it does so quite consistently with the same stack trace every time. A gdb trace is at at:

ftp://ftp.openldap.org/incoming/rein-test039-gdb-trace.txt

Does this happen on both HEAD and RE24, or RE24 only?

Both, as well as when running the head tests suite with the 2.4.23 release. Looks as if the swamp additions have tripped into an existing problem, not anything new. Leave it out of RE24 until if have been resolved?

Btw, any other Solaris test runs out there? I´t like to know if it is a real Solaris problem or just me..

Rein

Quanah Gibson-Mount

7:40 p.m.

--On Friday, January 07, 2011 3:01 PM +0100 Rein Tollevik rein@OpenLDAP.org wrote:

...

Both, as well as when running the head tests suite with the 2.4.23 release. Looks as if the swamp additions have tripped into an existing problem, not anything new. Leave it out of RE24 until if have been resolved?

I would suggest we fix the existing issue then.

...

Btw, any other Solaris test runs out there? I´t like to know if it is a real Solaris problem or just me..

I have no access to Solaris.

--Quanah

Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration

Doug Leavitt

9:26 p.m.

On 01/ 7/11 08:01 AM, Rein Tollevik wrote:

...

On 06.01.11 22.48, Quanah Gibson-Mount wrote:

...
--On Thursday, January 06, 2011 7:40 PM +0100 Rein Tollevik rein@OpenLDAP.org wrote:

...
On 04.01.11 23.34, Quanah Gibson-Mount wrote:

...
Please test RE24 heavily.

test039 deadlocks for me on 64bit solaris10, both x86 and sparc :-( It hangs in the monitor, triggered by the new swamp -SS option added to slapd-tester. It works if run with -S or -SSS. It is the third server that hangs, and it does so quite consistently with the same stack trace every time. A gdb trace is at at:

ftp://ftp.openldap.org/incoming/rein-test039-gdb-trace.txt

Does this happen on both HEAD and RE24, or RE24 only?

Both, as well as when running the head tests suite with the 2.4.23 release. Looks as if the swamp additions have tripped into an existing problem, not anything new. Leave it out of RE24 until if have been resolved?

Btw, any other Solaris test runs out there? I´t like to know if it is a real Solaris problem or just me..

Rein

I'm currently testing Solaris11 (Nevada) and not seeing any issues in either 32 or 64 bit builds using both RE24 and HEAD. I have not had any failures on x86 yet. Testing is still underway for sparc and other internal system testing on both platforms.

Doug.

Howard Chu

8 Jan 8 Jan

12:55 a.m.

Doug Leavitt wrote:

...

On 01/ 7/11 08:01 AM, Rein Tollevik wrote:

...
On 06.01.11 22.48, Quanah Gibson-Mount wrote:

...
--On Thursday, January 06, 2011 7:40 PM +0100 Rein Tollevik rein@OpenLDAP.org wrote:

...
On 04.01.11 23.34, Quanah Gibson-Mount wrote:

...
Please test RE24 heavily.

test039 deadlocks for me on 64bit solaris10, both x86 and sparc :-( It hangs in the monitor, triggered by the new swamp -SS option added to slapd-tester. It works if run with -S or -SSS. It is the third server that hangs, and it does so quite consistently with the same stack trace every time. A gdb trace is at at:

ftp://ftp.openldap.org/incoming/rein-test039-gdb-trace.txt

Does this happen on both HEAD and RE24, or RE24 only?

Both, as well as when running the head tests suite with the 2.4.23 release. Looks as if the swamp additions have tripped into an existing problem, not anything new. Leave it out of RE24 until if have been resolved?

Btw, any other Solaris test runs out there? I´t like to know if it is a real Solaris problem or just me..

I'm seeing a similar failure on 32 bit Sparc Solaris 10. But it actually locks up in test036 for me, I never get as far as test039. The gdb trace looks much the same as what you posted.

Looks like for some reason threads that are blocked waiting for their sockets to become writable are never getting waken up. A regular SIGINT shuts down slapd cleanly so it doesn't appear to be a problem with the condvars being used to manage the threads. That kinda points to select() simply not returning the writable status.

I haven't used this Solaris machine much, but in fact (looking at the remnants of other files in my source tree on this box) this appears to have been a problem since at least last August. (I.e., it looks like I was investigating this same problem back then but dropped it and never got back to it.)

...

...
Rein

...

I'm currently testing Solaris11 (Nevada) and not seeing any issues in either 32 or 64 bit builds using both RE24 and HEAD. I have not had any failures on x86 yet. Testing is still underway for sparc and other internal system testing on both platforms.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

masarati＠aero.polimi.it

11 Jan 11 Jan

12:03 a.m.

...

...
...
Both, as well as when running the head tests suite with the 2.4.23 release. Looks as if the swamp additions have tripped into an existing problem, not anything new. Leave it out of RE24 until if have been resolved?

Btw, any other Solaris test runs out there? I´t like to know if it is a real Solaris problem or just me..

I'm seeing a similar failure on 32 bit Sparc Solaris 10. But it actually locks up in test036 for me, I never get as far as test039. The gdb trace looks much the same as what you posted.

Looks like for some reason threads that are blocked waiting for their sockets to become writable are never getting waken up. A regular SIGINT shuts down slapd cleanly so it doesn't appear to be a problem with the condvars being used to manage the threads. That kinda points to select() simply not returning the writable status.

I haven't used this Solaris machine much, but in fact (looking at the remnants of other files in my source tree on this box) this appears to have been a problem since at least last August. (I.e., it looks like I was investigating this same problem back then but dropped it and never got back to it.)

Not sure whether it is related, but I'm currently running test036 with -DLDAP_THREAD_DEBUG (for unrelated purposes) and I see some mutex-related failures, of the type

conn=1031 op=1 SRCH base="cn=Monitor" scope=2 deref=0 filter="(objectClass=*)" ../../../ldap-2.4-src/libraries/libldap_r/thr_debug.c:1029: ldap_pvt_thread_mutex_unlock error: !THREAD_MUTEX_OWNER( mutex ) ../../../ldap-2.4-src/libraries/libldap_r/thr_debug.c:1033: ldap_pvt_thread_mutex_unlock error: rc is 1

I see a lot of them; they always appear within operations affecting back-monitor, this seems to be consistent with Rein's backtrace.

uname -a Linux fl1 2.6.34.7-0.5-desktop #1 SMP PREEMPT 2010-10-25 08:40:12 +0200 x86_64 x86_64 x86_64 GNU/Linux

Hallvard B Furuseth

12:25 p.m.

masarati@aero.polimi.it writes:

...

Not sure whether it is related, but I'm currently running test036 with -DLDAP_THREAD_DEBUG (for unrelated purposes) and I see some mutex-related failures, of the type

conn=1031 op=1 SRCH base="cn=Monitor" scope=2 deref=0 filter="(objectClass=*)" ../../../ldap-2.4-src/libraries/libldap_r/thr_debug.c:1029: ldap_pvt_thread_mutex_unlock error: !THREAD_MUTEX_OWNER( mutex )

Got a backtace? And maybe valgrind output with env LDAP_THREAD_DEBUG=alloc,nosync ./run test036?

-- Hallvard

Pierangelo Masarati

12:34 p.m.

Hallvard B Furuseth wrote:

...

masarati@aero.polimi.it writes:

...
Not sure whether it is related, but I'm currently running test036 with -DLDAP_THREAD_DEBUG (for unrelated purposes) and I see some mutex-related failures, of the type

conn=1031 op=1 SRCH base="cn=Monitor" scope=2 deref=0 filter="(objectClass=*)" ../../../ldap-2.4-src/libraries/libldap_r/thr_debug.c:1029: ldap_pvt_thread_mutex_unlock error: !THREAD_MUTEX_OWNER( mutex )

Got a backtace?

No (I was using LDAP_THREAD_DEBUG=noabort).

...

And maybe valgrind output with env LDAP_THREAD_DEBUG=alloc,nosync ./run test036?

Yes, but lost and unable to reproduce so far, sorry. At first I thought it was easily reproducible, so I didn't try that hard; eventually I was unable to reproduce using regular slapd. It occurred running slapd under valgrind and a certain combination of of -S/-SS in slapd-tester. Later I'll try to reproduce the right combination of parameters.

Howard Chu

12 Jan 12 Jan

5:18 a.m.

masarati@aero.polimi.it wrote:

...

...
...
...
Both, as well as when running the head tests suite with the 2.4.23 release. Looks as if the swamp additions have tripped into an existing problem, not anything new. Leave it out of RE24 until if have been resolved?

Btw, any other Solaris test runs out there? I´t like to know if it is a real Solaris problem or just me..

I'm seeing a similar failure on 32 bit Sparc Solaris 10. But it actually locks up in test036 for me, I never get as far as test039. The gdb trace looks much the same as what you posted.

Looks like for some reason threads that are blocked waiting for their sockets to become writable are never getting waken up. A regular SIGINT shuts down slapd cleanly so it doesn't appear to be a problem with the condvars being used to manage the threads. That kinda points to select() simply not returning the writable status.

I haven't used this Solaris machine much, but in fact (looking at the remnants of other files in my source tree on this box) this appears to have been a problem since at least last August. (I.e., it looks like I was investigating this same problem back then but dropped it and never got back to it.)

Not sure whether it is related, but I'm currently running test036 with -DLDAP_THREAD_DEBUG (for unrelated purposes) and I see some mutex-related failures, of the type

conn=1031 op=1 SRCH base="cn=Monitor" scope=2 deref=0 filter="(objectClass=*)" ../../../ldap-2.4-src/libraries/libldap_r/thr_debug.c:1029: ldap_pvt_thread_mutex_unlock error: !THREAD_MUTEX_OWNER( mutex ) ../../../ldap-2.4-src/libraries/libldap_r/thr_debug.c:1033: ldap_pvt_thread_mutex_unlock error: rc is 1

I see a lot of them; they always appear within operations affecting back-monitor, this seems to be consistent with Rein's backtrace.

uname -a Linux fl1 2.6.34.7-0.5-desktop #1 SMP PREEMPT 2010-10-25 08:40:12 +0200 x86_64 x86_64 x86_64 GNU/Linux

Running with valgrind/helgrind, I get a hang on Linux too. Unfortunately I can't get a backtrace from the valgrind'd slapd. It shows a fair number of data races in back-meta.

There are also some lock ordering issues, but we already know about most of them and the code avoids deadlock using trylock() when needed. But there are a couple that don't, and thus are deadlock hazards. (request and abandon in libldap seems to be the prime offender.)

I've uploaded my testrun directory to http://highlandsun.com/hyc/20110111-testr.tgz

for reference. (Looks like ftp.openldap.org is full again.)

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Howard Chu

13 Jan 13 Jan

1:51 a.m.

Howard Chu wrote:

...

masarati@aero.polimi.it wrote:

...
...
...
...
Both, as well as when running the head tests suite with the 2.4.23 release. Looks as if the swamp additions have tripped into an existing problem, not anything new. Leave it out of RE24 until if have been resolved?

Btw, any other Solaris test runs out there? I´t like to know if it is a real Solaris problem or just me..

I'm seeing a similar failure on 32 bit Sparc Solaris 10. But it actually locks up in test036 for me, I never get as far as test039. The gdb trace looks much the same as what you posted.

Looks like for some reason threads that are blocked waiting for their sockets to become writable are never getting waken up. A regular SIGINT shuts down slapd cleanly so it doesn't appear to be a problem with the condvars being used to manage the threads. That kinda points to select() simply not returning the writable status.

Since there are reports of success on Solaris 9 and Solaris 11 I'm content to pass this off as a bug in Solaris 10. In the meantime, all tests are now passing for me in HEAD on Solaris 10 with ITS#6783 and #6787 patches.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Rein Tollevik

7:32 p.m.

On 13.01.11 01.51, Howard Chu wrote:

...

Howard Chu wrote:

...
masarati@aero.polimi.it wrote:

...
...
...
...
Both, as well as when running the head tests suite with the 2.4.23 release. Looks as if the swamp additions have tripped into an existing problem, not anything new. Leave it out of RE24 until if have been resolved?

Btw, any other Solaris test runs out there? I´t like to know if it is a real Solaris problem or just me..

I'm seeing a similar failure on 32 bit Sparc Solaris 10. But it actually locks up in test036 for me, I never get as far as test039. The gdb trace looks much the same as what you posted.

Looks like for some reason threads that are blocked waiting for their sockets to become writable are never getting waken up. A regular SIGINT shuts down slapd cleanly so it doesn't appear to be a problem with the condvars being used to manage the threads. That kinda points to select() simply not returning the writable status.

Since there are reports of success on Solaris 9 and Solaris 11 I'm content to pass this off as a bug in Solaris 10. In the meantime, all tests are now passing for me in HEAD on Solaris 10 with ITS#6783 and #6787 patches.

With the current head, all tests pass for me as well, on 64bit redhat4/x86, solaris10/x86 and solaris10/sparc :-) Well done!

Rein

5273

Age (days ago)

5282

Last active (days ago)

openldap-devel@openldap.org

22 comments

11 participants

tags (0)

participants (11)

Dieter Kluenter
Doug Leavitt
Gavin Henry
Hallvard B Furuseth
Howard Chu
Jens Vagelpohl
Jonathan CLARKE
masarati＠aero.polimi.it
Pierangelo Masarati
Quanah Gibson-Mount
Rein Tollevik