ando@sys-net.it wrote:
audrius.valunas@teo.lt wrote:
There is synchronous replication between mastyer and slave. When network connectivity problems occur master closes tcp connection but slave doesn't notice those problems, it still has tcp connection open, but in real it is not receiving updates any more. I think that can be solved adding some ack from slave because sending on such a socket would fail and force slave to retry connection.
Well, this should already be taken into consideration by SO_KEEPALIVE, which is always set when available on all connections. I concur that it usually requires quite a long time before a connection is actually checked (usually more than 2 hours), so some better policy could be put in place.
On most systems the TCP keepalive timing is a system-wide parameter, so it wouldn't be practical to try to manipulate that here. I suppose if we were to implement our own retry timer we'd need to use a benign op that triggers a reply. Searching the rootDSE for attr 1.1 would suffice. The question is, do we really want to be generating a lot of keepalive traffic like this? The default of 2 hours that most systems use is pretty sane, really.