Re: RE24 connection code reworking

26 Jan 2009


      Pierangelo Masarati wrote:
...
Pierangelo Masarati wrote:
...
No more failures of this kind; however, now I intermittently get
replication failures:
The problem persists (only once in a while).  It might still be
connection-related, since the logs of server #3, the proxy that pushes
replication to the consumer, are stuffed with tons of
"connection_read(...): no connection!"
What kind of system are you running on? Linux / multiprocessor?
One of the problems with epoll() on Linux is that it wakes up for HANGUP 
events all the time (they are not selectable in the input options; they're 
delivered regardless of whether you choose to wait for them or not). This also 
means we can't shut the notifications off when we acknowledge/act on them. So 
you'll get lots of repeated wakeups for the same hangup event. The new 
connection_hangup() function processes these inline for normal connections, 
but it still falls into the connection_read thread handling for client 
connections, so their normal cleanup handlers can be invoked. If your server 
is too busy, it will take a while for the submitted thread to execute, and 
then you'll get a lot of these spurious messages.
I've been experimenting with epoll's edge-triggered and oneshot modes, which 
would prevent multiple wakeups occurring for the same event. But 
unfortunately, when I set that it seems that the events can't be *re-enabled* 
when we want them, and so slapd hangs. Still looking at this.
But that's beside the point - you shouldn't be seeing any replication failures 
at all, regardless of connection close handling. What else are you seeing now?
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: RE24 connection code reworking