Re: (ITS#5443) Multiple identical attibutes break syncrepl process fatally? - openldap-bugs

28 Mar 2008

      marian.eichholz@freenet.ag wrote:
...
We use openldap as mail service directory with some 8 Mio objects on several
replicas.
For openldap 2.4.x we have to migrate from slurpd to syncrepl.
We got a working syncrepl provider als slurpd consumer (slapadd -q, 36 hrs)
So I try to get a blank DB up by syncrepl only (yes, it is not at all
performant, but informative)
The process kind of breaks after a couple of minutes and some 44.000 objects
(8.000.000 expected). Tracing it on the consumer side (-d 16384), I see
something like this after an entry:
syncrepl_message_to_entry: rid=001 mods check (forwardto: value #4 provided more
than once)
Indeed, the entry to come has three "forwardto:" Attributes with the same value
(and other forwardto-attributes, too). This makes no greate sense at the
application level, but until now it has been perfectly OK for the directory, and
the LDAP-API did not complain about the attribute modification, neither did the
slurpd.
It is a violation of RFC 4512, section 2.2, which OpenLDAP 2.4 conforms to.
...
This leads to some questions and suggestions:

the provider does not log anything with -d 16384, no error, no nothing. Could

it do some useful logging about successful and failing replication sessions?
What's -16384?  Since OpenLDAP 2.3 you can use strings to identify each 
log subsystem (16384 == 0x4000 == "sync").
The error occurs when the consumer tries to manipulate the data it 
receives.  The producer has nothing to do with it, since it assumes that 
data contained in it already passed sanity checks when they were stored. 
  How incorrect data got stored into the producer is a totally different 
business, and the producer-side replication process should not muck with it.
...

the consumer does not log anything that can explain, why the remaining objects

are not read, either. A bit of warning/logging could help the hopeful admin,
probably.
A sync error occurred, which prevented sync'in from continuing.  This 
error is logged by the "sync" subsystem.  As far as I understand from 
reading the code, the error (at least, a replication error) should be 
logged also by the "any" subsystem, which means that as soon as any 
logging is enabled, you get a message logged.
...

why is one problematic object lethal for the whole rest of the objects, since

future modifications keep to be incorporated? Is this lack of robustness more a
bug or a feature?
If inconsistent data is received, synchronization is supposed to stop. 
In fact, continuing may result in an inconsistent state.  The fact that 
the stop is caused by a real error, and the fact that fixing the error 
allows synchronization to recover doesn't sound like lack of robustness 
to me.  It sounds more about wisdom.
...

are identical attributes really forbidden with LDAP?

RFC 4512, Section 2.2
...

what could one do, to prevent unskillful "editors" of the master node to kill

the replication processes for the whole replication cluster? Besides adding a
checking/filtering API layer, of course.
Slapd has sanity checks for this.  Slapadd doesn't, since it is supposed 
to be operated only with consistent data, as resulting from slapcat. 
You might have slapadd'ed inconsistent data to the producer.
In the end, I don't see how this ITS involves a bug in synchronization 
software.  The fact your producer got corrupted by inconsistent data 
might have been caused by a bug in the software, however your analysis 
does not give a clear indication of how it happened.  If it happened by 
slapadd, then it's a known (and desired, and documented) limitation of 
the software.  Unless you can reproduce it, I'd consider this ITS closed.
p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
---------------------------------------
Office:  +39 02 23998309
Mobile:  +39 333 4963172
Email:   pierangelo.masarati@sys-net.it
---------------------------------------