Please test RE24 and report any issues. All known regressions are now believed fixed. Thanks!
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Quanah Gibson-Mount wrote:
Please test RE24 and report any issues. All known regressions are now believed fixed. Thanks!
All tests passed on OpenSUSE 10.3 x86-64.
test056 failed on Ubuntu 9.04 x86-64 on the connection output. Running again succeeded. Comparing the slapd.1.log files from the bad run to the good run, something very odd shows up:
In the good run, the conn=1 op=0 RESULT is logged earlier - at the time it should be:
@@ -417,12 +417,14 @@ send_ldap_result: err=0 matched="" text="" send_ldap_response: msgid=1 tag=97 err=0 ber_flush2: 14 bytes to sd 13 +conn=1 op=0 RESULT tag=97 err=0 text= +do_bind: v3 anonymous bind connection_get(13) connection_get(13): got connid=1 connection_read(13): checking for input on id=1 ber_get_next ber_get_next: tag 0x30 len 422 contents: -op tag 99, time 1246147222 +op tag 99, time 1246147432 ber_get_next conn=1 op=1 do_search ber_scanf fmt ({miiiib) ber:
In the failed run, the op=0 RESULT is logged much later, after op=2 was already handled:
@@ -470,15 +472,268 @@ connection_read(13): checking for input on id=1 ber_get_next ber_get_next: tag 0x30 len 5 contents: -op tag 66, time 1246147222 +op tag 66, time 1246147432 ber_get_next ber_get_next on fd 13 failed errno=0 (Success) conn=1 op=2 do_unbind conn=1 op=2 UNBIND -conn=1 op=0 RESULT tag=97 err=0 text= -do_bind: v3 anonymous bind connection_close: conn=1 sd=13 conn=1 fd=13 closed +slap_listener_activate(7): +>>> slap_listener(ldap://localhost:9011/) +conn=2 fd=13 ACCEPT from IP=127.0.0.1:38898 (IP=127.0.0.1:9011) +connection_get(13) +connection_get(13): got connid=2 +connection_read(13): checking for input on id=2 +ber_get_next +ber_get_next: tag 0x30 len 12 contents: +op tag 96, time 1246147432
So, the monitor output disagrees: --- testrun.X/ldapsearch.flt 2009-06-27 17:00:22.000000000 -0700 +++ testdata/monitor1.out 2009-04-27 15:50:11.000000000 -0700 @@ -2,9 +2,9 @@ structuralObjectClass: monitorConnection monitorConnectionProtocol: 3 monitorConnectionOpsReceived: 2 -monitorConnectionOpsExecuting: 2 +monitorConnectionOpsExecuting: 1 monitorConnectionOpsPending: 0 -monitorConnectionOpsCompleted: 0 +monitorConnectionOpsCompleted: 1 monitorConnectionGet: 2 monitorConnectionRead: 2 monitorConnectionWrite: 0
because op=0 is still outstanding when the op=1 search runs. In both cases, the actual result was sent to the client at the same time (see the ber_flush2 message), but after it succeeded something prevented it from making further progress. Which is pretty odd, because after that it just does a cond_signal and a mutex_unlock. After ber_free_buf() it just gets a mutex on its counters so it can increment a couple items, but that should be a freebie now since each operation has its per-thread counter structure. I guess it's possible that the malloc library is locking us along the way, or even the stdio library.
Not sure if this is identical to ITS#5966, it doesn't seem like it. It also doesn't seem like something we should hold the release for; the monitor output is correct even if it's not what we expected to see.
Alternatively, we might consider moving the reset
if ( op->o_conn->c_conn_state == SLAP_C_BINDING ) op->o_conn->c_conn_state = SLAP_C_ACTIVE;
from connection_bind_cb() to connection_bind_cleanup_cb(), to prevent ops from getting dequeued until we're really free and clear.
Quanah Gibson-Mount quanah@zimbra.com writes:
Please test RE24 and report any issues. All known regressions are now believed fixed. Thanks!
OK on openSUSE-11.1-x86_64, after 20 test iterations no errors except for test058 which reports variying numbers of Race errors.
-Dieter
On 27/06/2009 20:50, Quanah Gibson-Mount wrote:
Please test RE24 and report any issues. All known regressions are now believed fixed. Thanks!
All tests passed for me on Ubuntu 32 bit, and Debian 64 bit (except 058 which returns false success).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Jun 27, 2009, at 19:50 , Quanah Gibson-Mount wrote:
Please test RE24 and report any issues. All known regressions are now believed fixed. Thanks!
On OS X 10.5.7/Intel I am still getting consistent failures on test056:
Starting test056-monitor ...
running defines.sh Starting slapd on TCP/IP port ... Using ldapsearch to check that slapd is running... Using ldapsearch to read connection monitor entries... Filtering ldapsearch results... Comparing filter output... comparison failed - connection monitor output is not correct
./scripts/test056-monitor failed (exit 1)
make[2]: *** [bdb-yes] Error 1 make[1]: *** [test] Error 2 make: *** [test] Error 2
However, in the testrun directory the files ldapsearch.flt and ldapsearch.out do not show any differences according to diff. This is what both contain:
dn: cn=Connection 1,cn=Connections,cn=Monitor structuralObjectClass: monitorConnection monitorConnectionProtocol: 3 monitorConnectionOpsReceived: 2 monitorConnectionOpsExecuting: 1 monitorConnectionOpsPending: 0 monitorConnectionOpsCompleted: 1 monitorConnectionGet: 2 monitorConnectionRead: 2 monitorConnectionWrite: 0 monitorConnectionMask: rx monitorConnectionListener: ldap://localhost:9011/ monitorConnectionLocalAddress: IP=[::1]:9011 entryDN: cn=Connection 1,cn=Connections,cn=Monitor
dn: cn=Connections,cn=Monitor structuralObjectClass: monitorContainer entryDN: cn=Connections,cn=Monitor
dn: cn=Current,cn=Connections,cn=Monitor structuralObjectClass: monitorCounterObject entryDN: cn=Current,cn=Connections,cn=Monitor
dn: cn=Max File Descriptors,cn=Connections,cn=Monitor structuralObjectClass: monitorCounterObject entryDN: cn=Max File Descriptors,cn=Connections,cn=Monitor
dn: cn=Total,cn=Connections,cn=Monitor structuralObjectClass: monitorCounterObject entryDN: cn=Total,cn=Connections,cn=Monitor
jens
Jens Vagelpohl writes:
Starting test056-monitor ...
running defines.sh Starting slapd on TCP/IP port ... Using ldapsearch to check that slapd is running... Using ldapsearch to read connection monitor entries... Filtering ldapsearch results... Comparing filter output... comparison failed - connection monitor output is not correct
./scripts/test056-monitor failed (exit 1)
make[2]: *** [bdb-yes] Error 1 make[1]: *** [test] Error 2 make: *** [test] Error 2
slapd is OK, the bug is in test056-monitor. It needs to be updated to handle IPv6:
However, in the testrun directory the files ldapsearch.flt and ldapsearch.out do not show any differences according to diff.
ldapsearch.flt is a cleaned-up version of ldapsearch.out. Looking at the code, the diff is with testdata/monitor1.out:
--- testdata/monitor1.out +++ testrun/ldapsearch.flt @@ -12,3 +12,3 @@ dn: cn=Connection 1,cn=Connections,cn=Mo monitorConnectionListener: ldap://localhost:9011/ -monitorConnectionLocalAddress: IP=127.0.0.1:9011 +monitorConnectionLocalAddress: IP=[::1]:9011 entryDN: cn=Connection 1,cn=Connections,cn=Monitor
Hallvard B Furuseth wrote:
ldapsearch.flt is a cleaned-up version of ldapsearch.out. Looking at the code, the diff is with testdata/monitor1.out:
--- testdata/monitor1.out +++ testrun/ldapsearch.flt @@ -12,3 +12,3 @@ dn: cn=Connection 1,cn=Connections,cn=Mo monitorConnectionListener: ldap://localhost:9011/ -monitorConnectionLocalAddress: IP=127.0.0.1:9011 +monitorConnectionLocalAddress: IP=[::1]:9011 entryDN: cn=Connection 1,cn=Connections,cn=Monitor
Ah, good catch. IPv6 also contributed to some of the problems Aaron Richton was reporting in test033 too (but that was due to the change in timing that it caused).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Jun 29, 2009, at 18:10 , Hallvard B Furuseth wrote:
However, in the testrun directory the files ldapsearch.flt and ldapsearch.out do not show any differences according to diff.
ldapsearch.flt is a cleaned-up version of ldapsearch.out. Looking at the code, the diff is with testdata/monitor1.out:
--- testdata/monitor1.out +++ testrun/ldapsearch.flt @@ -12,3 +12,3 @@ dn: cn=Connection 1,cn=Connections,cn=Mo monitorConnectionListener: ldap://localhost:9011/ -monitorConnectionLocalAddress: IP=127.0.0.1:9011 +monitorConnectionLocalAddress: IP=[::1]:9011 entryDN: cn=Connection 1,cn=Connections,cn=Monitor
Yes, you're absolutely right. I have only had IPv6 enabled on this laptop for the last couple months. It was not enabled when Howard called for tests the last time before this 2.4.17 run.
jens
Jens Vagelpohl wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Jun 27, 2009, at 19:50 , Quanah Gibson-Mount wrote:
Please test RE24 and report any issues. All known regressions are now believed fixed. Thanks!
On OS X 10.5.7/Intel I am still getting consistent failures on test056:
Looks exactly like what I ran into here http://www.openldap.org/lists/openldap-devel/200906/msg00048.html
and subsequently fixed http://www.openldap.org/lists/openldap-commit/200906/msg00171.html
Starting test056-monitor ...
running defines.sh Starting slapd on TCP/IP port ... Using ldapsearch to check that slapd is running... Using ldapsearch to read connection monitor entries... Filtering ldapsearch results... Comparing filter output... comparison failed - connection monitor output is not correct
./scripts/test056-monitor failed (exit 1)
make[2]: *** [bdb-yes] Error 1 make[1]: *** [test] Error 2 make: *** [test] Error 2
However, in the testrun directory the files ldapsearch.flt and ldapsearch.out do not show any differences according to diff. This is what both contain:
dn: cn=Connection 1,cn=Connections,cn=Monitor structuralObjectClass: monitorConnection monitorConnectionProtocol: 3 monitorConnectionOpsReceived: 2 monitorConnectionOpsExecuting: 1 monitorConnectionOpsPending: 0 monitorConnectionOpsCompleted: 1 monitorConnectionGet: 2 monitorConnectionRead: 2 monitorConnectionWrite: 0 monitorConnectionMask: rx monitorConnectionListener: ldap://localhost:9011/ monitorConnectionLocalAddress: IP=[::1]:9011 entryDN: cn=Connection 1,cn=Connections,cn=Monitor
Quanah Gibson-Mount wrote:
Please test RE24 and report any issues. All known regressions are now believed fixed. Thanks!
Built on AIX 5.3 (BDB 4.6 and 4.7, SASL 2.1.22, gcc 4.2.3) for both 32-bit and 64-bit binary support.
All tests pass after 3 runs of each. The only oddity is the following in test058:
<snip> Starting central master slapd on TCP/IP port 9011... Using ldapsearch to check that central master slapd is running... Using ldapsearch to check that site2 master received base... Using ldapsearch to check that site2 search received base... Waiting 1 seconds for syncrepl to receive changes... Checking contextCSN after site2 servers repopulated... Adding syncrepl of second site1 master backend on central master... Using ldapsearch to check that central master received second site1 backend... Waiting 1 seconds for syncrepl to receive changes... Waiting 2 seconds for syncrepl to receive changes... Waiting 3 seconds for syncrepl to receive changes... Waiting 4 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... ERROR: Second site1 backend not replicated to central master Restarting central master slapd on TCP/IP port 9011... Using ldapsearch to check that central master slapd is running... Waiting 1 seconds for slapd to start... Using ldapsearch to check that central master received second site1 backend... Using ldapsearch to check that central search received second site1 backend... Waiting 1 seconds for syncrepl to receive changes... Waiting 2 seconds for syncrepl to receive changes... Waiting 3 seconds for syncrepl to receive changes... Waiting 4 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... ERROR: Second site1 backend not replicated to central search Restarting central search slapd on TCP/IP port 9014... Using ldapsearch to check that central search slapd is running... Waiting 1 seconds for slapd to start... Using ldapsearch to check that central search received second site1 backend... Running 1 of 10 syncrepl race tests... Stopping central master... </snip>
And the test continues, passes and reports those two errors. Still running a few for passes for the tests just to make sure.
Cheers, Bill
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc
Zimbra :: the leader in open source messaging and collaboration
Quanah Gibson-Mount wrote:
Please test RE24 and report any issues. All known regressions are now believed fixed. Thanks!
test057-memberof-refint failed with hdb the 10th 'make test'. testrun/ and testrun.out in http://folk.uio.no/hbf/OpenLDAP/testrun-test057.tgz
Unexpected ldapsearch result, diff -U5 ldapsearch.flt ldif.flt shows dn: cn=Roger Rabbit,ou=Toons,dc=example,dc=com objectClass: inetOrgPerson cn: Roger Rabbit sn: Rabbit -memberOf: cn=Cartoonia,ou=Groups,dc=example,dc=com +memberOf: cn=Cartoonia,ou=Studios,dc=example,dc=com
I wrote:
test057-memberof-refint failed with hdb the 10th 'make test'. testrun/ and testrun.out in http://folk.uio.no/hbf/OpenLDAP/testrun-test057.tgz
Unexpected ldapsearch result, diff -U5 ldapsearch.flt ldif.flt shows dn: cn=Roger Rabbit,ou=Toons,dc=example,dc=com objectClass: inetOrgPerson cn: Roger Rabbit sn: Rabbit -memberOf: cn=Cartoonia,ou=Groups,dc=example,dc=com +memberOf: cn=Cartoonia,ou=Studios,dc=example,dc=com
Two more failures after 21 and 38 runs, both with diff -U5 ldapsearch.flt ldif.flt showing dn: cn=Cartoonia,ou=Groups,dc=example,dc=com objectClass: groupOfNames cn: Cartoonia -member: cn=Roger Rabbit,ou=People,dc=example,dc=com -member: cn=Baby Herman,ou=People,dc=example,dc=com +member: cn=Roger Rabbit,ou=Toons,dc=example,dc=com +member: cn=Baby Herman,ou=Toons,dc=example,dc=com
I've put them all in the same tar file. Valgrind is silent. (Valgrind does complain with ./run -b ldif test057. Removed the test for hdb only, and olcDbCacheSize...olcDbMode in the cn=config modify.)
Previous error fixed, but now test048-syncrepl-multiproxy failed with "test failed - master and slave databases differ". Testrun at: http://folk.uio.no/hbf/OpenLDAP/test048-testrun.tgz valgrind -q did not complain.
Hallvard B Furuseth wrote:
Previous error fixed, but now test048-syncrepl-multiproxy failed with "test failed - master and slave databases differ". Testrun at: http://folk.uio.no/hbf/OpenLDAP/test048-testrun.tgz valgrind -q did not complain.
Again, this is just a timing issue in the test; the diff in the two server's output shows a missing entry, but examining slapd.2.log shows that the entry was replicated, just that it didn't get there before the ldapsearch began, so it was not included in the set of results.