marian.eichholz@freenet.ag wrote:
We use openldap as mail service directory with some 8 Mio objects on several replicas. For openldap 2.4.x we have to migrate from slurpd to syncrepl. We got a working syncrepl provider als slurpd consumer (slapadd -q, 36 hrs)
So I try to get a blank DB up by syncrepl only (yes, it is not at all performant, but informative)
The process kind of breaks after a couple of minutes and some 44.000 objects (8.000.000 expected). Tracing it on the consumer side (-d 16384), I see something like this after an entry:
syncrepl_message_to_entry: rid=001 mods check (forwardto: value #4 provided more than once)
Indeed, the entry to come has three "forwardto:" Attributes with the same value (and other forwardto-attributes, too). This makes no greate sense at the application level, but until now it has been perfectly OK for the directory, and the LDAP-API did not complain about the attribute modification, neither did the slurpd.
It is a violation of RFC 4512, section 2.2, which OpenLDAP 2.4 conforms to.
This leads to some questions and suggestions:
- the provider does not log anything with -d 16384, no error, no nothing. Could
it do some useful logging about successful and failing replication sessions?
What's -16384? Since OpenLDAP 2.3 you can use strings to identify each log subsystem (16384 == 0x4000 == "sync").
The error occurs when the consumer tries to manipulate the data it receives. The producer has nothing to do with it, since it assumes that data contained in it already passed sanity checks when they were stored. How incorrect data got stored into the producer is a totally different business, and the producer-side replication process should not muck with it.
- the consumer does not log anything that can explain, why the remaining objects
are not read, either. A bit of warning/logging could help the hopeful admin, probably.
A sync error occurred, which prevented sync'in from continuing. This error is logged by the "sync" subsystem. As far as I understand from reading the code, the error (at least, a replication error) should be logged also by the "any" subsystem, which means that as soon as any logging is enabled, you get a message logged.
- why is one problematic object lethal for the whole rest of the objects, since
future modifications keep to be incorporated? Is this lack of robustness more a bug or a feature?
If inconsistent data is received, synchronization is supposed to stop. In fact, continuing may result in an inconsistent state. The fact that the stop is caused by a real error, and the fact that fixing the error allows synchronization to recover doesn't sound like lack of robustness to me. It sounds more about wisdom.
- are identical attributes really forbidden with LDAP?
RFC 4512, Section 2.2
- what could one do, to prevent unskillful "editors" of the master node to kill
the replication processes for the whole replication cluster? Besides adding a checking/filtering API layer, of course.
Slapd has sanity checks for this. Slapadd doesn't, since it is supposed to be operated only with consistent data, as resulting from slapcat. You might have slapadd'ed inconsistent data to the producer.
In the end, I don't see how this ITS involves a bug in synchronization software. The fact your producer got corrupted by inconsistent data might have been caused by a bug in the software, however your analysis does not give a clear indication of how it happened. If it happened by slapadd, then it's a known (and desired, and documented) limitation of the software. Unless you can reproduce it, I'd consider this ITS closed.
p.
Ing. Pierangelo Masarati OpenLDAP Core Team
SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it --------------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Email: pierangelo.masarati@sys-net.it ---------------------------------------