Hello list.
I'm trying to achieve multi-master setup, starting from a working single-master setup. I took the master node configuration, added the following directives, and distributed it identically on two nodes:
# global serverID 1 ldap://10.202.11.8:389/ serverID 2 ldap://10.202.11.9:389/
# db ... syncrepl rid=1 provider=ldap://10.202.11.8:389/ starttls=yes tls_reqcert=never type=refreshAndPersist retry="60 +" logbase="cn=log" logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" syncdata=accesslog searchbase="dc=msr-inria,dc=inria,dc=fr" scope=sub schemachecking=off bindmethod=simple binddn="cn=syncrepl,ou=roles,dc=msr-inria,dc=inria,dc=fr" credentials=XYZ
syncrepl rid=2 provider=ldap://10.202.11.9:389/ starttls=yes tls_reqcert=never type=refreshAndPersist retry="60 +" logbase="cn=log" logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" syncdata=accesslog searchbase="dc=msr-inria,dc=inria,dc=fr" scope=sub schemachecking=off bindmethod=simple binddn="cn=syncrepl,ou=roles,dc=msr-inria,dc=inria,dc=fr" credentials=XYZ
mirrormode on
The 'tls_reqcert=never' is needed because those two servers are accessed from a virtual interface under a load-balancing server, and the certificate name matches the name of this virtual interface, not the actual interface of the servers (I wonder if openldap would support altSubjName in x509 certs, but that's another issue).
Then I imported my base in the first server, and launched both of them.
When node1 (full) tries to access node2 (empty), it fails, because it can't authenticate with a DN still not present in other node database, which is quite understandable.
However, node2 connects successfully, sync the the OU object in the DIT, then fails to actually sync the first user object, with this error message in his logs: Jan 13 11:29:20 avron2 slapd[20939]: null_callback : error code 0x13 Jan 13 11:29:20 avron2 slapd[20939]: syncrepl_entry: rid=001 be_add uid=ingleber,ou=users,dc=msr-inria,dc=inria,dc=fr (19) Jan 13 11:29:20 avron2 slapd[20939]: syncrepl_entry: rid=001 be_add uid=ingleber,ou=users,dc=msr-inria,dc=inria,dc=fr failed (19) Jan 13 11:29:20 avron2 slapd[20939]: do_syncrepl: rid=001 rc 19 retrying
In node1 logs: Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 op=1 BIND dn="cn=syncrepl,ou=roles,dc=msr-inria,dc=inria,dc=fr" method=128 Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 op=1 BIND dn="cn=syncrepl,ou=roles,dc=msr-inria,dc=inria,dc=fr" mech=SIMPLE ssf=0 Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 op=1 RESULT tag=97 err=0 text= Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 op=2 SRCH base="dc=msr-inria,dc=inria,dc=fr" scope=2 deref=0 filter="(objectClass=*)" Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 op=2 SRCH attr=* + Jan 13 10:28:31 avron1 slapd[15713]: send_search_entry: conn 1000 ber write failed. Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 fd=21 closed (connection lost on write)
It's hard to tell if the failure occurs on the provider (ber write failed message) or consumer side (null_callback : error code 0x13).
Any hint welcome.