I am having a problem with what appears (to me) to be ‘stale’
TCP connections for syncrepl between the master and a pair of slaves.
After restarting all, I see changes on the master replicated to both
slaves. BUT, if I wait about 30 minutes or more, then make a change, the
replication fails (most of the time). netstat on the LDAP port show the
connections still established, but queued packets at the master server.
After about 15 minutes, the master server drops the connection. An
overnight tcpdump on the master showed LDAP occasionally sending a keep-alive,
with 2hrs between the keep-alive messages (these keep-alives are inconsistent,
though, some nights I see none).
I am running Red Hat EL5 and Openldap 2.3.43 on all servers with
no TLS or SASL (in our integration/test facility).
I don’t see anything in the documentation pertaining
to keep-alives, other than ITS#4708 for 2.3.38.
Here’s the syncrepl for one slave:
syncrepl
rid=004
type=refreshAndPersist
provider=ldap://172.24.1.191
retry="30 10 300 3"
searchbase="o=partner_x,dc=ourcompany-int,dc=net"
filter="(objectClass=*)"
scope=sub
schemachecking=off
bindmethod=simple
binddn="cn=syncRepl,o=partner_x,dc=ourcompany-int,dc=net "
credentials="secret"
updateref
ldap://172.24.1.191
The other slave’s slapd.conf is indentical except
rid=002.
On the master I have:
overlay
syncprov
syncprov-checkpoint
100 30
syncprov-sessionlog
100
Note: The 2 slaves are running on blades in an IBM
chassis, and the master is on a 1U Linux server, just ‘one-hop’
away. Prior to this, when I had a master/slave pair running on the
blades, syncRepl was working fine for several months. It was not until I
moved the master to the another server did the failures start.
Thanks in advance for any help or info.
John Kane