Quanah Gibson-Mount wrote:
Please test RE24 and report any issues. All known regressions are now believed fixed. Thanks!
All tests passed on OpenSUSE 10.3 x86-64.
test056 failed on Ubuntu 9.04 x86-64 on the connection output. Running again succeeded. Comparing the slapd.1.log files from the bad run to the good run, something very odd shows up:
In the good run, the conn=1 op=0 RESULT is logged earlier - at the time it should be:
@@ -417,12 +417,14 @@ send_ldap_result: err=0 matched="" text="" send_ldap_response: msgid=1 tag=97 err=0 ber_flush2: 14 bytes to sd 13 +conn=1 op=0 RESULT tag=97 err=0 text= +do_bind: v3 anonymous bind connection_get(13) connection_get(13): got connid=1 connection_read(13): checking for input on id=1 ber_get_next ber_get_next: tag 0x30 len 422 contents: -op tag 99, time 1246147222 +op tag 99, time 1246147432 ber_get_next conn=1 op=1 do_search ber_scanf fmt ({miiiib) ber:
In the failed run, the op=0 RESULT is logged much later, after op=2 was already handled:
@@ -470,15 +472,268 @@ connection_read(13): checking for input on id=1 ber_get_next ber_get_next: tag 0x30 len 5 contents: -op tag 66, time 1246147222 +op tag 66, time 1246147432 ber_get_next ber_get_next on fd 13 failed errno=0 (Success) conn=1 op=2 do_unbind conn=1 op=2 UNBIND -conn=1 op=0 RESULT tag=97 err=0 text= -do_bind: v3 anonymous bind connection_close: conn=1 sd=13 conn=1 fd=13 closed +slap_listener_activate(7): +>>> slap_listener(ldap://localhost:9011/) +conn=2 fd=13 ACCEPT from IP=127.0.0.1:38898 (IP=127.0.0.1:9011) +connection_get(13) +connection_get(13): got connid=2 +connection_read(13): checking for input on id=2 +ber_get_next +ber_get_next: tag 0x30 len 12 contents: +op tag 96, time 1246147432
So, the monitor output disagrees: --- testrun.X/ldapsearch.flt 2009-06-27 17:00:22.000000000 -0700 +++ testdata/monitor1.out 2009-04-27 15:50:11.000000000 -0700 @@ -2,9 +2,9 @@ structuralObjectClass: monitorConnection monitorConnectionProtocol: 3 monitorConnectionOpsReceived: 2 -monitorConnectionOpsExecuting: 2 +monitorConnectionOpsExecuting: 1 monitorConnectionOpsPending: 0 -monitorConnectionOpsCompleted: 0 +monitorConnectionOpsCompleted: 1 monitorConnectionGet: 2 monitorConnectionRead: 2 monitorConnectionWrite: 0
because op=0 is still outstanding when the op=1 search runs. In both cases, the actual result was sent to the client at the same time (see the ber_flush2 message), but after it succeeded something prevented it from making further progress. Which is pretty odd, because after that it just does a cond_signal and a mutex_unlock. After ber_free_buf() it just gets a mutex on its counters so it can increment a couple items, but that should be a freebie now since each operation has its per-thread counter structure. I guess it's possible that the malloc library is locking us along the way, or even the stdio library.
Not sure if this is identical to ITS#5966, it doesn't seem like it. It also doesn't seem like something we should hold the release for; the monitor output is correct even if it's not what we expected to see.
Alternatively, we might consider moving the reset
if ( op->o_conn->c_conn_state == SLAP_C_BINDING ) op->o_conn->c_conn_state = SLAP_C_ACTIVE;
from connection_bind_cb() to connection_bind_cleanup_cb(), to prevent ops from getting dequeued until we're really free and clear.