New subject: syncrepl slaves all quit after master restart - not a single retry

28 Jul 2010


      Hello guys,
I have a problem with delta-syn replication (all set up according to
'official' guide -
http://www.openldap.org/doc/admin24/replication.html#Delta-syncrepl
I have master instance with logs 'shipped' to a client - it all works fine
as long as connection is good.
Getting ready to move into production I'm trying to emulate connectivity
problems and here where I got problems.
Specifically - even though I have mirror instance set up as:
syncrepl rid=101
        provider=ldap://192.168.22.62:389
        type=refreshAndPersist
        bindmethod=simple
        binddn="cn=replicator,xxxxx"
        credentials="xxxxxx"
        searchbase="xxxxxxx"
        filter="(objectClass=*)"
        logbase="cn=accesslog"
        logfilter="(&(objectClass=auditWriteObject)(reqResult=0))"
        scope=sub
        attrs="*,+"
        schemachecking=off
*        retry="1 +"*
        syncdata=accesslog
once I have server disconnected (I sumply restart slapd on master), the
client not even tries to re-connect, the log below shows modificatin
operation at 18:34:18 that went fine and 11 seconds later I restart master's
ldap service (which became immediately available again):
Jul 28 18:34:18 newton slapd[20353]: => entry_encode(0x00000032):
mail=xxxxxxxxxxxxxxxxxxxxxxxxxx.
Jul 28 18:34:18 newton slapd[20353]: bdb_modify: updated id=00000032
dn="yyyyyyyyyyyyyyyyyyyyyyyy"
Jul 28 18:34:18 newton slapd[20353]: send_ldap_result: conn=-1 op=0 p=0
Jul 28 18:34:18 newton slapd[20353]: send_ldap_result: err=0 matched=""
text=""
Jul 28 18:34:18 newton slapd[20353]: syncrepl_entry: rid 101 be_modify (0)
Jul 28 18:34:18 newton slapd[20353]: bdb_modify: xxxxxxxxxxxxxxxxxx.
Jul 28 18:34:18 newton slapd[20353]: bdb_dn2entry("oxxxxxxxxxxxxxxx")
Jul 28 18:34:18 newton slapd[20353]: bdb_modify_internal: 0x00000001:
o=xxxxxxxxxxxxxxxxx.
Jul 28 18:34:18 newton slapd[20353]: <= acl_access_allowed: granted to
database root
Jul 28 18:34:18 newton slapd[20353]: bdb_modify_internal: replace contextCSN
Jul 28 18:34:18 newton slapd[20353]: => entry_encode(0x00000001):
o=xxxxxxxxxxxxxxxxxxxxx.
Jul 28 18:34:18 newton slapd[20353]: bdb_modify: updated id=00000001
dn="xxxxxxxxxxxxxxxx"
Jul 28 18:34:18 newton slapd[20353]: send_ldap_result: conn=-1 op=0 p=0
Jul 28 18:34:18 newton slapd[20353]: send_ldap_result: err=0 matched=""
text=""
Jul 28 18:34:18 newton slapd[20353]: daemon: activity on 1 descriptor
Jul 28 18:34:18 newton slapd[20353]: daemon: activity on:
Jul 28 18:34:18 newton slapd[20353]:
Jul 28 18:34:18 newton slapd[20353]: daemon: epoll: listen=7
active_threads=0 tvp=NULL
Jul 28 18:34:29 newton slapd[20353]: daemon: activity on 1 descriptor
Jul 28 18:34:29 newton slapd[20353]: daemon: activity on:
Jul 28 18:34:29 newton slapd[20353]:  14r
Jul 28 18:34:29 newton slapd[20353]:
Jul 28 18:34:29 newton slapd[20353]: daemon: read active on 14
Jul 28 18:34:29 newton slapd[20353]: daemon: epoll: listen=7
active_threads=0 tvp=NULL
Jul 28 18:34:29 newton slapd[20353]: connection_get(14)
Jul 28 18:34:29 newton slapd[20353]: connection_get(14): got connid=0
Jul 28 18:34:29 newton slapd[20353]: =>do_syncrepl rid 101
Jul 28 18:34:29 newton slapd[20353]: =>do_syncrep2 rid 101
Jul 28 18:34:29 newton slapd[20353]: do_syncrep2: rid 101 Can't contact LDAP
server
Jul 28 18:34:29 newton slapd[20353]: connection_get(14)
Jul 28 18:34:29 newton slapd[20353]: connection_get(14): got connid=0
Jul 28 18:34:29 newton slapd[20353]: daemon: removing 14
Jul 28 18:34:29 newton slapd[20353]: daemon: activity on 1 descriptor
Jul 28 18:34:29 newton slapd[20353]: daemon: activity on:
Jul 28 18:34:29 newton slapd[20353]:
Jul 28 18:34:29 newton slapd[20353]: daemon: epoll: listen=7
active_threads=0 tvp=NULL
Jul 28 18:34:29 newton slapd[20353]: do_syncrepl: rid 101 quitting
I'm running openldap 2.3.43-12.el5_5.1 from standard CentOS 5.4
installation.
Do I get something wrong and slave not supposed to re-connect after master
service restart or is this some kind of a problem that was fixed in later
versions?
Thank you,
Alex

delta-sync replication slave quitting problem - not a single retry