refresh and persistant not always persisting :-) - openldap-technical

11 Oct 2023


      Hi,
Of course I add many more details like detailed configs and logs, just ask.
We have a 4-host MMR setup, that all replicate to the three others,
relevant snippets from the config:
* moduleload      syncprov.la
* different (and consistent) serverID's configured correctly on all hosts
and then on each host the three other LDAP servers are defined like the
below sample from ldapm01.company.com :
syncrepl rid=212 provider=ldaps://ldapm02.company.com:636 bindmethod=simple
binddn="cn=ldap_replicate,ou=Directory Access,o=company,c=com"
credentials=xyz searchbase="o=company,c=com" filter="(objectClass=*)"
scope=sub schemachecking=on type=refreshAndPersist retry="1 2 3 4 5 +"
attrs="*,+" tls_reqcert=demand
syncrepl rid=232 provider=ldaps://ldaps01.company.com:636 bindmethod=simple
binddn="cn=ldap_replicate,ou=Directory Access,o=company,c=com"
credentials=xyz searchbase="o=company,c=com" filter="(objectClass=*)"
scope=sub schemachecking=on type=refreshAndPersist retry="1 2 3 4 5 +"
attrs="*,+" tls_reqcert=demand
syncrepl rid=242 provider=ldaps://ldaps02.company.com:636 bindmethod=simple
binddn="cn=ldap_replicate,ou=Directory Access,o=company,c=com"
credentials=xyz searchbase="o=company,c=com" filter="(objectClass=*)"
scope=sub schemachecking=on type=refreshAndPersist retry="1 2 3 4 5 +"
attrs="*,+" tls_reqcert=demand
MirrorMode True
We observe that usually replication is quick and instant. But "sometimes"
(yes, sometimes...) a replication line (rid) can become 'stuck', and
suddeny, after an hour or so it syncs up again. Restarting openldap make it
sync immediately. We notice that because we compare contextCSN's across our
nodes to monitor replication.
There is no significant load, firewall ports are open, but hosts are in
different subnets: subnet A: ldapm01 / ldapm02, subnet B: ldaps01 / ldaps02
Of course 389/636 ports are open between the subnets.
We wondered what could cause this behaviour, and started thinking in the
direction of long-lived tcp connections that perhaps are used in
refreshAndPersist functionality. (much like in IMAP idle)
Is anything special needed to make refreshAndPersistwork reliably through
firewalls, and across subnets? Does refreshAndPersistwork use (some kind
of) long-lived network connections..? Is there a kind of "keepalive"
setting that we could try..?
Input would be appreciated. Also input like: you are looking in the wrong
direction, better look into this or that. :-)
Thanks!