Re: RE24 testing call (2.4.17) round 2

27 Jun 2009

Quanah Gibson-Mount wrote:
...
Please test RE24 and report any issues.  All known regressions are now
believed fixed.  Thanks!
All tests passed on OpenSUSE 10.3 x86-64.
test056 failed on Ubuntu 9.04 x86-64 on the connection output. Running again 
succeeded. Comparing the slapd.1.log files from the bad run to the good run, 
something very odd shows up:
In the good run, the conn=1 op=0 RESULT is logged earlier - at the time it 
should be:
@@ -417,12 +417,14 @@
  send_ldap_result: err=0 matched="" text=""
  send_ldap_response: msgid=1 tag=97 err=0
  ber_flush2: 14 bytes to sd 13
+conn=1 op=0 RESULT tag=97 err=0 text=
+do_bind: v3 anonymous bind
  connection_get(13)
  connection_get(13): got connid=1
  connection_read(13): checking for input on id=1
  ber_get_next
  ber_get_next: tag 0x30 len 422 contents:
-op tag 99, time 1246147222
+op tag 99, time 1246147432
  ber_get_next
  conn=1 op=1 do_search
  ber_scanf fmt ({miiiib) ber:
In the failed run, the op=0 RESULT is logged much later, after op=2 was 
already handled:
@@ -470,15 +472,268 @@
  connection_read(13): checking for input on id=1
  ber_get_next
  ber_get_next: tag 0x30 len 5 contents:
-op tag 66, time 1246147222
+op tag 66, time 1246147432
  ber_get_next
  ber_get_next on fd 13 failed errno=0 (Success)
  conn=1 op=2 do_unbind
  conn=1 op=2 UNBIND
-conn=1 op=0 RESULT tag=97 err=0 text=
-do_bind: v3 anonymous bind
  connection_close: conn=1 sd=13
  conn=1 fd=13 closed
+slap_listener_activate(7):
+>>> slap_listener(ldap://localhost:9011/)
+conn=2 fd=13 ACCEPT from IP=127.0.0.1:38898 (IP=127.0.0.1:9011)
+connection_get(13)
+connection_get(13): got connid=2
+connection_read(13): checking for input on id=2
+ber_get_next
+ber_get_next: tag 0x30 len 12 contents:
+op tag 96, time 1246147432
So, the monitor output disagrees:

--- testrun.X/ldapsearch.flt	2009-06-27 17:00:22.000000000 -0700
+++ testdata/monitor1.out	2009-04-27 15:50:11.000000000 -0700
@@ -2,9 +2,9 @@
  structuralObjectClass: monitorConnection
  monitorConnectionProtocol: 3
  monitorConnectionOpsReceived: 2
-monitorConnectionOpsExecuting: 2
+monitorConnectionOpsExecuting: 1
  monitorConnectionOpsPending: 0
-monitorConnectionOpsCompleted: 0
+monitorConnectionOpsCompleted: 1
  monitorConnectionGet: 2
  monitorConnectionRead: 2
  monitorConnectionWrite: 0
because op=0 is still outstanding when the op=1 search runs. In both cases, 
the actual result was sent to the client at the same time (see the ber_flush2 
message), but after it succeeded something prevented it from making further 
progress. Which is pretty odd, because after that it just does a cond_signal 
and a mutex_unlock. After ber_free_buf() it just gets a mutex on its counters 
so it can increment a couple items, but that should be a freebie now since 
each operation has its per-thread counter structure. I guess it's possible 
that the malloc library is locking us along the way, or even the stdio library.
Not sure if this is identical to ITS#5966, it doesn't seem like it. It also 
doesn't seem like something we should hold the release for; the monitor output 
is correct even if it's not what we expected to see.
Alternatively, we might consider moving the reset
if ( op->o_conn->c_conn_state == SLAP_C_BINDING )
         op->o_conn->c_conn_state = SLAP_C_ACTIVE;
from connection_bind_cb() to connection_bind_cleanup_cb(), to prevent ops from 
getting dequeued until we're really free and clear.
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: RE24 testing call (2.4.17) round 2