Hi,
As an interim measure while deploying 2.4.16 I am canarying 2.3.43 on a replication provider. As a result the current replication path is: master (2.3.39) -> provider (2.3.43) -> replica (2.4.16) The master will be upgraded in short order once the 2.3.43 canary is successful.
I've been seeing occasional corrupt DNs in some be_add log lines on the 2.4.16 replica: May 5 09:35:46 host slapd[31817]: syncrepl_message_to_op: rid=100 be_add <90>Y1 ntry,ou=subtree,dc=example,dc=com (0) I've modified the DN in this log line. The missing text is "cn=l" in this example. The original DN was 65 characters long.
I have performed the following search against each host with the following results. It shows that the entry replicated fine but capitalisation of the DN differs (which may be a red herring since I was already aware that DN capitalisation differed across servers): $ ldapsearch -x -b cn=lntry,ou=subtree,dc=example,dc=com -s base dn master: dn is cn=lntry,ou=Subtree,dc=example,dc=com provider: dn is cn=lntry,ou=subtree,dc=example,dc=com replica: dn is cn=lntry,ou=Subtree,dc=example,dc=com
slapcat shows no problems with the entry on the 2.4.16 host.
Since the database looks fine I wonder if this is just a logging issue. Should this Debug statement in syncrepl.c actually use op->ora_e->e_name.bv_val or some other attribute? rc = op->o_bd->be_add( op, &rs ); Debug( LDAP_DEBUG_SYNC, "syncrepl_message_to_op: %s be_add %s (%d)\n", si->si_ridtxt, op->o_req_dn.bv_val, rc );
With the exception of si->si_rid becoming si->si_ridtxt (and %d->%s) this Debug statement has not changed since 2.3.
Sean Burford wrote:
slapcat shows no problems with the entry on the 2.4.16 host.
Since the database looks fine I wonder if this is just a logging issue. Should this Debug statement in syncrepl.c actually use op->ora_e->e_name.bv_val or some other attribute?
Looks like a side-effect of ITS#5326. And no, you can't use op->ora_e because the backend may free it before returning (back-bdb/hdb definitely do).
rc = op->o_bd->be_add( op, &rs ); Debug( LDAP_DEBUG_SYNC, "syncrepl_message_to_op: %s be_add %s (%d)\n", si->si_ridtxt, op->o_req_dn.bv_val, rc );
With the exception of si->si_rid becoming si->si_ridtxt (and %d->%s) this Debug statement has not changed since 2.3.
On Tue, May 5, 2009 at 12:52 PM, Howard Chu hyc@symas.com wrote:
Sean Burford wrote:
slapcat shows no problems with the entry on the 2.4.16 host.
Since the database looks fine I wonder if this is just a logging issue. Should this Debug statement in syncrepl.c actually use op->ora_e->e_name.bv_val or some other attribute?
Looks like a side-effect of ITS#5326. And no, you can't use op->ora_e because the backend may free it before returning (back-bdb/hdb definitely do).
Do you agree that this looks like a logging issue rather than a database issue?
Sean Burford wrote:
On Tue, May 5, 2009 at 12:52 PM, Howard Chu <hyc@symas.com mailto:hyc@symas.com> wrote:
Sean Burford wrote: slapcat shows no problems with the entry on the 2.4.16 host. Since the database looks fine I wonder if this is just a logging issue. Should this Debug statement in syncrepl.c actually use op->ora_e->e_name.bv_val or some other attribute? Looks like a side-effect of ITS#5326. And no, you can't use op->ora_e because the backend may free it before returning (back-bdb/hdb definitely do).
Do you agree that this looks like a logging issue rather than a database issue?
Yes, and it is fixed now in HEAD.
Hi,
Thanks for the patch, however it looks like the problem is still there.
On Tue, May 5, 2009 at 1:01 PM, Howard Chu hyc@symas.com wrote:
Sean Burford wrote:
On Tue, May 5, 2009 at 12:52 PM, Howard Chu <hyc@symas.com mailto:hyc@symas.com> wrote:
Sean Burford wrote:
slapcat shows no problems with the entry on the 2.4.16 host. Since the database looks fine I wonder if this is just a logging issue. Should this Debug statement in syncrepl.c actually use op->ora_e->e_name.bv_val or some other attribute?
Looks like a side-effect of ITS#5326. And no, you can't use op->ora_e because the backend may free it before returning (back-bdb/hdb definitely do).
Do you agree that this looks like a logging issue rather than a database issue?
Yes, and it is fixed now in HEAD.
I've got three 2.4.16 servers running with the patched add.c from head. They are replicating from the 2.3.43 server. One of them logged this today (the other two logged the correct DN) indicating that this is still an issue:
May 6 12:10:27 ldapserver slapd[29037]: syncrepl_message_to_op: rid=100 be_add µ-ESCxxxxxxxxxx,ou=people,dc=example,dc=com (0)
4 characters were lost (uid=) under the µ-ESC. xxxxxxxxxx masks the uid.
For reference I've attached the patch I used.
--On Wednesday, May 06, 2009 12:43 PM -0700 Sean Burford unix.gurus@gmail.com wrote:
Hi,
Thanks for the patch, however it looks like the problem is still there.
It was an additional patch for a problem that actually hadn't been integrated from HEAD to RE24 yet, so I'm not surprised. :P
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
openldap-technical@openldap.org