Hello
Let me start by introducing myself briefly: * I'm a sysadm at the Norwegian NREN (UNINETT) * As sysadm here, I end up doing a heck lot of things at once, and our LDAP servers are really just tiny (but important) pieces in our huge puzzle that is our network infrastructure. Point being, I dont have the capasity to "know it all", or RTFM (and comprehend) everything I'm involved with, things just tend to drop into my lap. * I'm not paricularly interested or fond of LDAP as such :) * I have also no particular interest in diving into LDAP and become some sort of LDAP-wizzard in the future
I just felt mentioning the above before anyone throw me an RTFM ;)
Anyways...
I have at last upgraded a system from using slurpd (debian etch, slapd 2.3.30) to using replsync, at least that was the intention.
Let me start with the scenario:
* one master LDAP-server with a web frontend (old modified GOsa)
* 30ish slave LDAP-servers spread around on various institutions
* on the master LDAP-server, each institution has its own branch, like dc=foo,dc=no ; dc=bar,dc=no etc.
* each slave is supposed to replacicate only its own branch, for exmple server.foo.no only has dc=foo,dc=no replicated from the master
* for each dc=foo,dc=no there is an admin user, eg. cn=admin,dc=foo,dc=no that has all rights granted to the according subtree dc=foo,dc=no
That in all its simplity i the scenario.
With slurpd I had in masters slapd.conf entries like this:
replica host="server.foo.no" suffix="dc=foo,dc=no" binddn="cn=admin,dc=foo,dc=no" credentials="f00bAr123" bindmethod="simple" tls="critical"
and on the slaves, (running 2.4.23, they were upgraded some time ago):
updatedn "cn=admin,dc=foo,dc=no" updateref ldap://masterserver.uninett.no/
This worked fine, apart from occations of out-of-sync every now and then.
Now, with upgraded master - I have yet to get any replication working.
Which sync method is most likely the best for my scenario?
On master I have added: =================== ... moduleload syncprov.la moduleload back_ldap.la ...
# and under database, it looks like this:
database bdb suffix "dc=no" directory "/var/lib/ldap" rootdn "cn=root,dc=no" rootpw {SMD5}XXXXXXXXXXXXXXXXXXXXXXXXXXX= index objectClass eq index uid,gidNumber,uidNumber,memberUid pres,eq index mail,gosaMailAlternateAddress pres,eq,sub index gosaUser,gosaObject pres,eq,sub index zoneName,relativeDomaiNname pres,eq lastmod on sizelimit 4000 overlay syncprov syncprov-checkpoint 1000 60 syncprov-sessionlog 100
# and some access lists
access to dn.regex="dc=([^,]+),dc=no$" attrs=userPassword,sambaLMPassword,sambaNTPassword,goImapPassword by dn.regex="^cn=admin,dc=$1,dc=no$" write by anonymous auth by self write by * none
access to dn.base="" by * read
access to dn.regex="dc=([^,]+),dc=no$" by dn.regex="^cn=admin,dc=$1,dc=no$" write by * read ===================
And on slave: =================== ... ...
database bdb suffix "dc=foo,dc=no" directory "/var/lib/ldap" rootdn "cn=admin,dc=foo,dc=no" rootpw {SMD5}XXXXXXXXXXXXXXXXXXXXXXXXXXX= index objectclass,entryCSN,entryUUID eq index uid,gidNumber,uidNumber,memberUid pres,eq index mail,gosaMailAlternateAddress pres,eq,sub lastmod on sizelimit 4000 # updatedn "cn=admin,dc=foo,dc=no" # updateref ldap://masterserver.uninett.no/
syncrepl rid=123 provider=ldaps://masterserver.uninett.no:636/ type=refreshOnly interval=00:00:00:10 retry="60 +" searchbase="dc=foo,dc=no" scope=sub schemachecking=off bindmethod=simple binddn="cn=admin,dc=foo,dc=no" credentials="f00bAr123"
access to attrs=userPassword,sambaLMPassword,sambaNTPassword,goImapPassword by anonymous auth by self write by * none
access to dn.base="" by * read
access to * by * read
===================
I have (in good tradition) done a slapcat of the subtree dc=foo,dc=no on the master and copied over the ldif to the slave, and there used slapadd to create the entire database from scratch, so the content is identical.
When I start slapd on the slave I get on the slave: =================== 18:37:50 server.foo.no slapd[7971]: @(#) $OpenLDAP: slapd 2.4.23 (Jul 5 2010 18:35:50) $ ^Iroot@localhost:/home/kolla/openldap/openldap-2.4.23/debian/build/servers/slapd 18:37:50 server.foo.no slapd[7972]: slapd starting 18:37:50 server.foo.no slapd[7972]: syncrepl_message_to_entry: rid=123 mods check (objectClass: value #0 provided more than once) 18:37:50 server.foo.no slapd[7972]: do_syncrepl: rid=123 rc 20 retrying 18:38:50 server.foo.no slapd[7972]: syncrepl_message_to_entry: rid=123 mods check (objectClass: value #0 provided more than once) 18:38:50 server.foo.no slapd[7972]: do_syncrepl: rid=123 rc 20 retrying ===================
And on the master: =================== 18:37:50 ratatosk slapd[9162]: conn=1069 fd=16 ACCEPT from IP=NNN.NN.NN.NN:43227 (IP=0.0.0.0:636) 18:37:50 ratatosk slapd[9162]: conn=1069 fd=16 TLS established tls_ssf=256 ssf=256 18:37:50 ratatosk slapd[9162]: conn=1069 op=0 BIND dn="cn=admin,dc=foo,dc=no" method=128 18:37:50 ratatosk slapd[9162]: conn=1069 op=0 BIND dn="cn=admin,dc=foo,dc=no" mech=SIMPLE ssf=0 18:37:50 ratatosk slapd[9162]: conn=1069 op=0 RESULT tag=97 err=0 text= 18:37:50 ratatosk slapd[9162]: conn=1069 op=1 SRCH base="dc=foo,dc=no" scope=2 deref=0 filter="(objectClass=*)" 18:37:50 ratatosk slapd[9162]: conn=1069 op=1 SRCH attr=* + 18:37:50 ratatosk slapd[9162]: conn=1069 op=2 UNBIND 18:37:50 ratatosk slapd[9162]: conn=1069 fd=16 closed 18:38:50 ratatosk slapd[9162]: conn=1070 fd=16 ACCEPT from IP=NNN.NN.NN.NN:43239 (IP=0.0.0.0:636) 18:38:50 ratatosk slapd[9162]: conn=1070 fd=16 TLS established tls_ssf=256 ssf=256 18:38:50 ratatosk slapd[9162]: conn=1070 op=0 BIND dn="cn=admin,dc=foo,dc=no" method=128 18:38:50 ratatosk slapd[9162]: conn=1070 op=0 BIND dn="cn=admin,dc=foo,dc=no" mech=SIMPLE ssf=0 18:38:50 ratatosk slapd[9162]: conn=1070 op=0 RESULT tag=97 err=0 text= 18:38:50 ratatosk slapd[9162]: conn=1070 op=1 SRCH base="dc=foo,dc=no" scope=2 deref=0 filter="(objectClass=*)" 18:38:50 ratatosk slapd[9162]: conn=1070 op=1 SRCH attr=* + 18:38:50 ratatosk slapd[9162]: conn=1070 op=2 UNBIND 18:38:50 ratatosk slapd[9162]: conn=1070 fd=16 closed ===================
and so it goes, but no sync is done whatsoever.
What am I doing wrong here?
And what could cause this message to appear on the slave: "syncrepl_message_to_entry: rid=123 mods check (objectClass: valu e #0 provided more than once)"
Any help is very welcome, especially examplified configs that I can use as "template" for my scenario.
Thanks a bunch! :)
--On Thursday, July 08, 2010 7:04 PM +0200 Kolbjørn Barmen kolbjorn.barmen@uninett.no wrote:
I have at last upgraded a system from using slurpd (debian etch, slapd 2.3.30) to using replsync, at least that was the intention.
I believe you mean SyncRepl (Sync Replication).
What version of OpenLDAP is on the master? 2.3.30?
===================
And on slave:
# updateref ldap://masterserver.uninett.no/
I'd still set updateref, so clients know where they should send updates.
syncrepl rid=123 provider=ldaps://masterserver.uninett.no:636/ type=refreshOnly interval=00:00:00:10 retry="60 +" searchbase="dc=foo,dc=no" scope=sub schemachecking=off bindmethod=simple binddn="cn=admin,dc=foo,dc=no" credentials="f00bAr123"
I highly advise using refreshAndPersist rather than refreshOnly.
When I start slapd on the slave I get on the slave:
18:37:50 server.foo.no slapd[7971]: @(#) $OpenLDAP: slapd 2.4.23 (Jul 5 2010 18:35:50) $ ^Iroot@localhost:/home/kolla/openldap/openldap-2.4.23/debian/build/server s/slapd 18:37:50 server.foo.no slapd[7972]: slapd starting 18:37:50 server.foo.no slapd[7972]: syncrepl_message_to_entry: rid=123 mods check (objectClass: value #0 provided more than once) 18:37:50 server.foo.no slapd[7972]: do_syncrepl: rid=123 rc 20 retrying 18:38:50 server.foo.no slapd[7972]: syncrepl_message_to_entry: rid=123 mods check (objectClass: value #0 provided more than once) 18:38:50 server.foo.no slapd[7972]: do_syncrepl: rid=123 rc 20 retrying ===================
I would advise you start the slave with the "-d -1" options passed to the slapd binary, so you can see exactly what the master is sending to the replica. It sounds like it is sending invalid data. This could be a bug in the version that the master is running. You might want to try running a separate master for testing, that uses OpenLDAP 2.4.23 as well. There have been a multitude of fixes to the syncrepl code in OpenLDAP since the 2.3 series.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
On Fri, 9 Jul 2010, Quanah Gibson-Mount wrote:
--On Thursday, July 08, 2010 7:04 PM +0200 Kolbjørn Barmen kolbjorn.barmen@uninett.no wrote:
I have at last upgraded a system from using slurpd (debian etch, slapd 2.3.30) to using replsync, at least that was the intention.
I believe you mean SyncRepl (Sync Replication).
Yes - ofcourse :)
What version of OpenLDAP is on the master? 2.3.30?
It is running 2.4.23.
syncrepl rid=123 provider=ldaps://masterserver.uninett.no:636/ type=refreshOnly interval=00:00:00:10 retry="60 +" searchbase="dc=foo,dc=no" scope=sub schemachecking=off bindmethod=simple binddn="cn=admin,dc=foo,dc=no" credentials="f00bAr123"
I highly advise using refreshAndPersist rather than refreshOnly.
Right! :)
When I start slapd on the slave I get on the slave:
18:37:50 server.foo.no slapd[7971]: @(#) $OpenLDAP: slapd 2.4.23 (Jul 5 2010 18:35:50) $ ^Iroot@localhost:/home/kolla/openldap/openldap-2.4.23/debian/build/server s/slapd 18:37:50 server.foo.no slapd[7972]: slapd starting 18:37:50 server.foo.no slapd[7972]: syncrepl_message_to_entry: rid=123 mods check (objectClass: value #0 provided more than once) 18:37:50 server.foo.no slapd[7972]: do_syncrepl: rid=123 rc 20 retrying 18:38:50 server.foo.no slapd[7972]: syncrepl_message_to_entry: rid=123 mods check (objectClass: value #0 provided more than once) 18:38:50 server.foo.no slapd[7972]: do_syncrepl: rid=123 rc 20 retrying ===================
I would advise you start the slave with the "-d -1" options passed to the slapd binary, so you can see exactly what the master is sending to the replica. It sounds like it is sending invalid data. This could be a bug in the version that the master is running. You might want to try running a separate master for testing, that uses OpenLDAP 2.4.23 as well. There have been a multitude of fixes to the syncrepl code in OpenLDAP since the 2.3 series.
Both slave and master are running 2.4.23.
After som debugging I found the culprit, turned out the error message "(objectClass: value #0 provided more than once)" was a nice hint. (allthought I don't quite see what "value ¤0" is supposed to tell me)
Just by coincident I tried to change the object "cn=admin,dc=foo,dc=no" on the master with an ldap editor (gq), and got the same error message in return.
It turned out that the object cn=admin,dc=foo,dc=no had multiple occurances of "objectClass: organizationalRole" (!), and this also prevented syncrepl from working. I suspect it was a result of "manual" editing of ldif files followed by an import using slapadd. I get no warnings from slapadd when I import import objects with multiple occurances of the same objectClass.
Perhaps slapadd/slapd should be able to deal with such duplicate entries better, to make it more obivous what's wrong? I'm just saying :)
Thanks!
--On Tuesday, July 20, 2010 5:40 PM +0200 Kolbjørn Barmen kolbjorn.barmen@uninett.no wrote:
Perhaps slapadd/slapd should be able to deal with such duplicate entries better, to make it more obivous what's wrong? I'm just saying :)
What options did you use with slapadd? If you used -q, this is probably expected. If you did not, I suggest filing an ITS, although slapadd is never really as strict as ldapadd will be. It is meant for loading LDIF created by an export, which should already be sane.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
On Tue, 20 Jul 2010, Quanah Gibson-Mount wrote:
--On Tuesday, July 20, 2010 5:40 PM +0200 Kolbjørn Barmen kolbjorn.barmen@uninett.no wrote:
Perhaps slapadd/slapd should be able to deal with such duplicate entries better, to make it more obivous what's wrong? I'm just saying :)
What options did you use with slapadd? If you used -q, this is probably expected. If you did not, I suggest filing an ITS, although slapadd is never really as strict as ldapadd will be. It is meant for loading LDIF created by an export, which should already be sane.
I did not use -q.
ITS submitted: http://www.openldap.org/its/index.cgi/Incoming?id=6592
Thanks! :)
It turned out that the object cn=admin,dc=foo,dc=no had multiple occurances of "objectClass: organizationalRole" (!), and this also prevented syncrepl from working. I suspect it was a result of "manual" editing of ldif files followed by an import using slapadd. I get no warnings from slapadd when I import import objects with multiple occurances of the same objectClass.
Perhaps slapadd/slapd should be able to deal with such duplicate entries better, to make it more obivous what's wrong? I'm just saying :)
slapd(8) can handle those occurrences. slapadd(8) is intended to load LDIF files generated by slapcat(8), thus presumably consistent. In general, it deals with the most obvious errors. I don't think asking slapadd to perform these checks is a good idea, as it would slow it down without real benefit: if an error is caught, you would need to restart, wasting all the actual write effort. A sanity check tool for unreliable LDIF would probably be more appropriate. I guess at this point most users would pretend their LDIF is always reliable, and avoid running the sanity checker...
p.
On Tue, 20 Jul 2010, masarati@aero.polimi.it wrote:
It turned out that the object cn=admin,dc=foo,dc=no had multiple occurances of "objectClass: organizationalRole" (!), and this also prevented syncrepl from working. I suspect it was a result of "manual" editing of ldif files followed by an import using slapadd. I get no warnings from slapadd when I import import objects with multiple occurances of the same objectClass.
Perhaps slapadd/slapd should be able to deal with such duplicate entries better, to make it more obivous what's wrong? I'm just saying :)
slapd(8) can handle those occurrences.
But does it handle it good enough, when it prevents replsync from working?
slapadd(8) is intended to load LDIF files generated by slapcat(8), thus presumably consistent.
And the file was indeed LDIF file generated by slapcat. Since slapd allows it, slapcat will also spit it out - when slapcat, slapadd and slapd all "handle it" without giving any warnings back to anyone, it's not so easy to detect errors.
In general, it deals with the most obvious errors. I don't think asking slapadd to perform these checks is a good idea, as it would slow it down without real benefit: if an error is caught, you would need to restart, wasting all the actual write effort.
I don't quite agree - as I understand it slapadd already does some sanity checking, how much overhead would a check for objectClass doublets imply? And I dont see why you would need to restart, on a doublet either spit out a warning, or even better - spit out a warning and discard the doublet.
A sanity check tool for unreliable LDIF would probably be more appropriate. I guess at this point most users would pretend their LDIF is always reliable, and avoid running the sanity checker...
Really? Yes, I would love a sanity checker, and I would most likely _always_ run LDIF through a sanity checker before using slapadd to write to back-end.
But again - slapadd already does some sanity checking, and there's even a flag for "dry-run" mode (-u) which IMO says that it is supposed to be used as a sanity checking tool. I'm perfectly OK to let _all_ sanity checks only occure when using -u.
I would love to dump all my ldap data to an LDIF and run it through a sanity checker, I suspect there's more "old noise" stuck in there.
Cheers! :)
On Tue, 20 Jul 2010, masarati@aero.polimi.it wrote:
It turned out that the object cn=admin,dc=foo,dc=no had multiple occurances of "objectClass: organizationalRole" (!), and this also prevented syncrepl from working. I suspect it was a result of "manual" editing of ldif files followed by an import using slapadd. I get no warnings from slapadd when I import import objects with multiple occurances of the same objectClass.
Perhaps slapadd/slapd should be able to deal with such duplicate entries better, to make it more obivous what's wrong? I'm just saying :)
slapd(8) can handle those occurrences.
But does it handle it good enough, when it prevents replsync from working?
This is a side effect: the replica receives bogus data via the protocol, and spits it.
slapadd(8) is intended to load LDIF files generated by slapcat(8), thus presumably consistent.
And the file was indeed LDIF file generated by slapcat.
I mean: from slapcat of a sane database.
Since slapd allows it, slapcat will also spit it out - when slapcat, slapadd and slapd all "handle it" without giving any warnings back to anyone, it's not so easy to detect errors.
No, you miss one link: slapd did not handle it (I mean: through protocol). When slapd starts up and opens a database, it does not validate its content, of course. And when it returns an entry, it does not validate its contents. Only when a write is performed, the contents are validated (usually, only the bit that's being written, if it's a modify).
In general, it deals with the most obvious errors. I don't think asking slapadd to perform these checks is a good idea, as it would slow it down without real benefit: if an error is caught, you would need to restart, wasting all the actual write effort.
I don't quite agree - as I understand it slapadd already does some sanity checking, how much overhead would a check for objectClass doublets imply?
Why don't you code and test it yourself? Checking for duplicates requires to normalize data and compare each value to eachother. A wise implementation has quadratic cost (n*(n-1)/2 comparisons). You were offended by a duplicate objectClass issue this time. If next time it happens to a group with 10,000 members, you'll be whining that your groups are perfectly sane, why does it take so long to load your LDIF?
And I dont see why you would need to restart, on a doublet either spit out a warning, or even better - spit out a warning and discard the doublet.
Those are implementation details; in many cases, the database needs to be complete - no holes; so if slapadd spits an entry, it may not be able to add its children.
A sanity check tool for unreliable LDIF would probably be more appropriate. I guess at this point most users would pretend their LDIF is always reliable, and avoid running the sanity checker...
Really? Yes, I would love a sanity checker, and I would most likely _always_ run LDIF through a sanity checker before using slapadd to write to back-end.
But again - slapadd already does some sanity checking,
Usually, as much as it's strictly required to properly perform its own task - regenerate a presumably sane database.
and there's even a flag for "dry-run" mode (-u) which IMO says that it is supposed to be used as a sanity checking tool. I'm perfectly OK to let _all_ sanity checks only occure when using -u.
Embedding the sanity checker in slapadd is an option, indeed. Not the default, IMHO.
I would love to dump all my ldap data to an LDIF and run it through a sanity checker, I suspect there's more "old noise" stuck in there.
Task separation is at the roots of clean programming - and system administration.
p.
openldap-technical@openldap.org