Hello to everyone,
I have a working OpenLDAP setup ( 2.3.43 - Centos 5.8 RPM ) with a Master LDAP and consumers worldwide across datacenters. I also monitor if directories from Consumers are in Sync with the master. Consumers sometimes fail to communicate with master ldap and replicate.Syncrepl retry interval does not work at that time. If i restart consumer LDAP service everything works as expected,changes replicate successfully. My consumer interval is 1 minute but when the directories are not in sync i dont see syncrepl messages in my logs. My logs:
Jun 7 02:22:12 xxx slapd[26433]: do_syncrep2: rid 007 LDAP_RES_SEARCH_RESULT (Normal ...) Jun 7 02:23:12 xxx slapd[26433]: do_syncrep2: rid 007 LDAP_RES_SEARCH_RESULT (Normal ...)
From now on .. syncrepl does not appear on logs, i got only that line
Jun 7 02:27:22 xxx slapd[26433]: do_syncrep1: rid 007 ldap_sasl_bind_s failed (-1)
I restart ldap service and everything works as expected
Jun 7 17:06:40 xxx slapd[4397]: syncrepl_entry: rid 007 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD) Jun 7 17:06:40 xxx slapd[4397]: syncrepl_entry: rid 007 be_search (0) Jun 7 17:06:42 xxx slapd[4397]: do_syncrep2: rid 007 LDAP_RES_SEARCH_RESULT Jun 7 17:07:44 xxx slapd[4397]: do_syncrep2: rid 007 LDAP_RES_SEARCH_RESULT Jun 7 17:08:45 xxx slapd[4397]: do_syncrep2: rid 007 LDAP_RES_SEARCH_RESULT
My consumers config:
syncrepl rid=007 provider=ldaps://master:636 bindmethod=simple binddn="cn=xxx,dc=xxx,dc=xxx" credentials=xxxx searchbase="dc=xxx,dc=xxx" scope=sub schemachecking=on #type=refreshAndPersist #retry="30 20 60 +" type=refreshOnly interval=00:00:01:00
updateref ldaps://master
And my master config:
database bdb suffix "dc=xxx,dc=xxx" rootdn "cn=xxx,dc=xxx,dc=xxx" rootpw {SSHA}xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx directory /var/lib/ldap
index objectClass eq,pres index ou,cn,mail,surname,givenname,gecos,description eq,pres,sub index uidNumber,gidNumber,uniqueMember,homeDirectory,loginShell eq,pres index uid,memberUid eq,pres,sub index nisMapName,nisMapEntry eq,pres,sub
loglevel 256 16384 logfile /var/log/ldap.log
overlay syncprov syncprov-checkpoint 100 10 syncprov-sessionlog 100
At the same time,when my directories are not in sync,my consumer in the same datacenter as the master is always in sync.This is fixed when i restart ldap service on consumers.Replication works for some hours after the restart and the interval consumers do not replicate is kind of random. Maybe a network issue as passing through firewall to reach consumers in other datacenters ? I have also tried changing the type of syncrepl to refreshAndPersist and the issue still exists.
It seems this issue is similar to this ( http://www.openldap.org/lists/openldap-technical/200905/msg00024.html ) but i thought pull bashed syncrepl from refreshOnly should not be impacted from states being dropped.
<< This sort of problem with long-lived connections is often due to state being dropped from IP-level devices. >>
Thanks a lot everyone