On 5/9/07, Quanah Gibson-Mount quanah@zimbra.com wrote: [snip]
Well, things seem fine from the accesslog DB side. On my servers @ Stanford, the largest latency I've seen is about 7 seconds. I wonder if something is slow on your network.
--Quanah
I set the loglevel to -1 on the consumer and ran further test:
The provider was idle for about 1 hour 45 min. At 20:29:50 I ran a batch of 100 sequential writes on the provider. Nothing happened on the consumer until 20:45:03. Here are the logs from the consumer:
May 9 20:40:04 info-ldap-006 slapd[32288]: daemon: epoll: listen=7 active_threads=0 tvp=zero May 9 20:40:04 info-ldap-006 slapd[32288]: daemon: epoll: listen=8 active_threads=0 tvp=zero May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: activity on 1 descriptor May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: activity on: May 9 20:45:03 info-ldap-006 slapd[32288]: 13r May 9 20:45:03 info-ldap-006 slapd[32288]: May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: read active on 13 May 9 20:45:03 info-ldap-006 slapd[32288]: connection_get(13) May 9 20:45:03 info-ldap-006 slapd[32288]: connection_get(13): got connid=0 May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: epoll: listen=7 active_threads=0 tvp=zero May 9 20:45:03 info-ldap-006 slapd[32288]: =>do_syncrepl rid 006 May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: epoll: listen=8 active_threads=0 tvp=zero May 9 20:45:03 info-ldap-006 slapd[32288]: =>do_syncrep2 rid 006 May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: activity on 1 descriptor May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: activity on: May 9 20:45:03 info-ldap-006 slapd[32288]: do_syncrep2: rid 006 Can't contact LDAP server May 9 20:45:03 info-ldap-006 slapd[32288]: May 9 20:45:03 info-ldap-006 slapd[32288]: connection_get(13) May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: hangup on 13 May 9 20:45:03 info-ldap-006 slapd[32288]: connection_get(13): got connid=0 May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: epoll: listen=7 active_threads=0 tvp=zero May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: removing 13 May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: epoll: listen=8 active_threads=0 tvp=zero May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: activity on 1 descriptor May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: activity on: May 9 20:45:03 info-ldap-006 slapd[32288]: May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: epoll: listen=7 active_threads=0 tvp=zero May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: epoll: listen=8 active_threads=0 tvp=zero May 9 20:45:03 info-ldap-006 slapd[32288]: do_syncrepl: rid 006 retrying May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: activity on 1 descriptor May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: activity on: May 9 20:45:03 info-ldap-006 slapd[32288]: May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: epoll: listen=7 active_threads=0 tvp=zero May 9 20:45:03 info-ldap-006 slapd[32288]: daemon: epoll: listen=8 active_threads=0 tvp=zero May 9 20:45:08 info-ldap-006 slapd[32288]: daemon: epoll: listen=7 active_threads=0 tvp=zero May 9 20:45:08 info-ldap-006 slapd[32288]: daemon: epoll: listen=8 active_threads=0 tvp=zero May 9 20:45:08 info-ldap-006 slapd[32288]: =>do_syncrepl rid 006 May 9 20:45:08 info-ldap-006 slapd[32288]: =>do_syncrep2 rid 006 May 9 20:45:08 info-ldap-006 slapd[32288]: connection_get(13) May 9 20:45:08 info-ldap-006 slapd[32288]: connection_get(13): got connid=0 May 9 20:45:08 info-ldap-006 slapd[32288]: daemon: added 13r listener=(nil) May 9 20:45:08 info-ldap-006 slapd[32288]: daemon: activity on 1 descriptor May 9 20:45:08 info-ldap-006 slapd[32288]: daemon: activity on: May 9 20:45:08 info-ldap-006 slapd[32288]: May 9 20:45:08 info-ldap-006 slapd[32288]: daemon: epoll: listen=7 active_threads=0 tvp=zero May 9 20:45:08 info-ldap-006 slapd[32288]: daemon: epoll: listen=8 active_threads=0 tvp=zero May 9 20:45:08 info-ldap-006 slapd[32288]: daemon: activity on 1 descriptor May 9 20:45:08 info-ldap-006 slapd[32288]: daemon: activity on: May 9 20:45:08 info-ldap-006 slapd[32288]: 13r May 9 20:45:08 info-ldap-006 slapd[32288]: May 9 20:45:08 info-ldap-006 slapd[32288]: daemon: read active on 13 May 9 20:45:08 info-ldap-006 slapd[32288]: connection_get(13) May 9 20:45:08 info-ldap-006 slapd[32288]: connection_get(13): got connid=0 May 9 20:45:08 info-ldap-006 slapd[32288]: =>do_syncrepl rid 006 May 9 20:45:08 info-ldap-006 slapd[32288]: daemon: epoll: listen=7 active_threads=0 tvp=zero May 9 20:45:08 info-ldap-006 slapd[32288]: daemon: epoll: listen=8 active_threads=0 tvp=zero May 9 20:45:08 info-ldap-006 slapd[32288]: =>do_syncrep2 rid 006 May 9 20:45:08 info-ldap-006 slapd[32288]: >>> dnPrettyNormal: <uid=chill,ou=People,dc=example,dc=com>
It looks like the consumer didn't detect that its previous connection to the provider had closed. It seems like the FW between the consumer and the provider is killing the connection without sending a TCP Reset to both sides. I definitely need to talk to our network guys about that.
Now my question is what triggered the 'do_syncrepl ' at 20:45:03? Why didn't the consumer perform the 'do_syncrepl' earlier and realize that the connection to the provider was lost?
Thanks, Sam