Hi,
We have three OpenLDAP servers: 10.0.0.2, 10.0.0.3 and 10.0.0.4 and one web server 10.0.0.5.
For a long time we've had two baseDN databases running on the OpenLDAP servers with ACLs configured so that anyone on the subnet can read anything in the system and our replication has been working well.
Recently, we've added the third database with more restrictive ACLs that look like this:
============================================================ access to * by dn="uid=replicator,ou=People,dc=bar" read by * break
access to attrs=userPassword,sambaNTPassword by group/groupOfNames/Member="cn=ldap-admins,ou=Group,dc=bar" write by self write by anonymous auth by * none
access to attrs=entry,uid,cn,sn,givenName,title,departmentNumber,mail,telephoneNumber,roomNumber by group/groupOfNames/Member="cn=ldap-admins,ou=Group,dc=bar" write by users read by * none
access to * by group/groupOfNames/Member="cn=ldap-admins,ou=Group,dc=bar" write by * none ============================================================
With these ACLs, replication works when anyone in the ldap-admins group posts an update to the server. However, if a user updates their own password, replication does not take place. If we change the last access clause to this:
============================================================ access to * by group/groupOfNames/Member="cn=ldap-admins,ou=Group,dc=bar" write by peername.regex="10.0.0.5" read by * none ============================================================
then the user self-updates are replicated properly. Note that 10.0.0.5 is the *web server*. The way I read this ACL, it means we're granting read access to the web server, and in so doing, the other two LDAP servers (10.0.0.3 and 10.0.0.4) are magically able to replicate data again.
When replication is *not* working in this set-up, re-starting slapd on 10.0.0.3 and 10.0.0.4 (without changing any ACLs anywhere) causes them to suck down all the updates they missed before.
Am I misunderstanding the way these ACLs work? Is there any way that giving READ access to the web server (which it already has by virtue of the user having bound themselves to the LDAP server) should cause replication for 10.0.0.3 and 10.0.0.4 to work again? Or is this perhaps a bug in the version of slapd (2.3.43; yes I know it's old; it's a vendor package and that's how we roll around here at the moment) that we're running?
I'm not really asking anyone to fix the problem or to offer a solution to the problem...I just want to know if this sort of replication issue was a known problem in the past?
Tim Gustafson Baskin School of Engineering UC Santa Cruz tjg@soe.ucsc.edu 831-459-5354
On Thu, Apr 15, 2010 at 01:18:13PM -0700, Tim Gustafson wrote:
access to * by dn="uid=replicator,ou=People,dc=bar" read by * break
I assume you are using syncrepl here.
It would be worth checking that the replication process really does bind as that DN. If it does, then all later access clauses are irrelevant.
When replication is *not* working in this set-up, re-starting slapd on 10.0.0.3 and 10.0.0.4 (without changing any ACLs anywhere) causes them to suck down all the updates they missed before.
Am I misunderstanding the way these ACLs work? Is there any way that giving READ access to the web server (which it already has by virtue of the user having bound themselves to the LDAP server) should cause replication for 10.0.0.3 and 10.0.0.4 to work again? Or is this perhaps a bug in the version of slapd (2.3.43; yes I know it's old; it's a vendor package and that's how we roll around here at the moment) that we're running?
This does not sound like an ACL problem to me. I would suggest setting up a test environment with the latest 2.4.x release to see what happens.
Andrew
On Thursday, 15 April 2010 21:18:13 Tim Gustafson wrote:
Hi,
We have three OpenLDAP servers: 10.0.0.2, 10.0.0.3 and 10.0.0.4 and one web server 10.0.0.5.
How is replication configured? Single master and multiple replicas synchronising off it (which is the master)? MMR or cascading replication (provide a diagram or description)?
If we change
I assume "change" implies restarting slapd ...
the last access clause to this:
[...]
then the user self-updates are replicated properly. Note that 10.0.0.5 is the *web server*. The way I read this ACL, it means we're granting read access to the web server, and in so doing, the other two LDAP servers (10.0.0.3 and 10.0.0.4) are magically able to replicate data again.
So, adding an irrelevant ACL, *and* restarting slapd causes the replication to resume? So, when this occurs, have you tested just restarting slapd?
When replication is *not* working in this set-up, re-starting slapd on 10.0.0.3 and 10.0.0.4 (without changing any ACLs anywhere) causes them to suck down all the updates they missed before.
So, if the ACLs are not modified, but the a restart of slapd forces replication, why would this be an ACL issue(or did I misunderstand this sentence)?
Maybe instead you should post your replication configuration?
Note, there are some still some reliability issues (IMHO) with syncrepl, in that it doesn't always handle connection failures without manual intervention. If these are fixed (I haven't been able to test them myself) in 2.4.x, they are probably still present in 2.3.x. For example, a firewall in between provider and consumer with infrequent changes where the firewall drops idle connections and the hosts have incorrect keepalives set would result in replication failures with refreshAndPersist.
I'm not really asking anyone to fix the problem or to offer a solution to the problem...I just want to know if this sort of replication issue was a known problem in the past?
You don't post anything related to your replication configuration when asking about a replication failure, so it is difficult to provide any more information.
Regards, Buchan
openldap-software@openldap.org