Hello list.
I'm trying to achieve multi-master setup, starting from a working
single-master setup. I took the master node configuration, added the
following directives, and distributed it identically on two nodes:
# global
serverID 1 ldap://10.202.11.8:389/
serverID 2 ldap://10.202.11.9:389/
# db
...
syncrepl rid=1
provider=ldap://10.202.11.8:389/
starttls=yes
tls_reqcert=never
type=refreshAndPersist
retry="60 +"
logbase="cn=log"
logfilter="(&(objectClass=auditWriteObject)(reqResult=0))"
syncdata=accesslog
searchbase="dc=msr-inria,dc=inria,dc=fr"
scope=sub
schemachecking=off
bindmethod=simple
binddn="cn=syncrepl,ou=roles,dc=msr-inria,dc=inria,dc=fr"
credentials=XYZ
syncrepl rid=2
provider=ldap://10.202.11.9:389/
starttls=yes
tls_reqcert=never
type=refreshAndPersist
retry="60 +"
logbase="cn=log"
logfilter="(&(objectClass=auditWriteObject)(reqResult=0))"
syncdata=accesslog
searchbase="dc=msr-inria,dc=inria,dc=fr"
scope=sub
schemachecking=off
bindmethod=simple
binddn="cn=syncrepl,ou=roles,dc=msr-inria,dc=inria,dc=fr"
credentials=XYZ
mirrormode on
The 'tls_reqcert=never' is needed because those two servers are accessed
from a virtual interface under a load-balancing server, and the
certificate name matches the name of this virtual interface, not the
actual interface of the servers (I wonder if openldap would support
altSubjName in x509 certs, but that's another issue).
Then I imported my base in the first server, and launched both of them.
When node1 (full) tries to access node2 (empty), it fails, because it
can't authenticate with a DN still not present in other node database,
which is quite understandable.
However, node2 connects successfully, sync the the OU object in the DIT,
then fails to actually sync the first user object, with this error
message in his logs:
Jan 13 11:29:20 avron2 slapd[20939]: null_callback : error code 0x13
Jan 13 11:29:20 avron2 slapd[20939]: syncrepl_entry: rid=001 be_add
uid=ingleber,ou=users,dc=msr-inria,dc=inria,dc=fr (19)
Jan 13 11:29:20 avron2 slapd[20939]: syncrepl_entry: rid=001 be_add
uid=ingleber,ou=users,dc=msr-inria,dc=inria,dc=fr failed (19)
Jan 13 11:29:20 avron2 slapd[20939]: do_syncrepl: rid=001 rc 19 retrying
In node1 logs:
Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 op=1 BIND
dn="cn=syncrepl,ou=roles,dc=msr-inria,dc=inria,dc=fr" method=128
Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 op=1 BIND
dn="cn=syncrepl,ou=roles,dc=msr-inria,dc=inria,dc=fr" mech=SIMPLE ssf=0
Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 op=1 RESULT tag=97 err=0
text=
Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 op=2 SRCH
base="dc=msr-inria,dc=inria,dc=fr" scope=2 deref=0 filter="(objectClass=*)"
Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 op=2 SRCH attr=* +
Jan 13 10:28:31 avron1 slapd[15713]: send_search_entry: conn 1000 ber
write failed.
Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 fd=21 closed (connection
lost on write)
It's hard to tell if the failure occurs on the provider (ber write
failed message) or consumer side (null_callback : error code 0x13).
Any hint welcome.
--
BOFH excuse #288:
Hard drive sleeping. Let it wake up on it's own...