Full_Name: Julien COMBES Version: 2.4.21 OS: Debian 5.0.4 URL: ftp://ftp.openldap.org/incoming/its-syncrepl-loop-moddn.tar.bz2 Submission from: (NULL) (212.23.175.185)
Hello,
I think I have found a loop problem with syncrepl replication with openldap 2.4.21, BDB 4.7.25 with all patches and hdb database. The problem appears sometimes when an entry is moved with "modrdbn -s" in a node which has just been created. I have reproduced the problem with the creation of a node and a moddn while the consumer was stopped and then restarted after.
The problem follows these steps :
- When it starts, the consumer does a request objectClass=* on the
provider : Feb 12 09:09:19 ldapma24-ida01 slapd[30445]: conn=1007 op=1 SRCH base="dc=my,dc=domain" scope=2 deref=0 filter="(objectClass=*)"
- The consumer finds the modrdn and tries to do this :
Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: ==>hdb_modrdn(cn=user1,ou=A,dc=my,dc=domain,cn=user1,ou=X,dc=my,dc=domain)
- The consumer fails with these errors :
Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: => hdb_dn2id("ou=x,dc=my,dc=domain") Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: <= hdb_dn2id: get failed: DB_NOTFOUND: No matching key/data pair found (-30988) Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: hdb_modrdn: newSup(ndn=ou=x,dc=my,dc=domain) not here! Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: send_ldap_result: conn=-1 op=0 p=0 Feb 12 09:09:19 ldapra24-ida01 slapd[12156]: send_ldap_result: err=32 matched="" text="new superior not found"
- The consumer retries the request objectClass=* on the provider and
loops on the problem. The replication doesn't work anymore.
To reproduce the problem, I have used these steps :
- start an empty provider
- ldapadd the entries in mydomain.ldif
ldapadd -x -h 127.0.0.1 -D "dc=my,dc=domain" -W -f mydomain.ldif
- start the consumer.
- stop the consumer when replication is finished
- ldapadd the new node
ldapadd -x -h 127.0.0.1 -D "dc=my,dc=domain" -W -f add.ldif
- modrdn -s
ldapmodrdn -x -h 127.0.0.1 -D "dc=my,dc=domain" -W -r -s "ou=X,dc=my,dc=domain" "cn=user1,ou=A,dc=my,dc=domain" "cn=user1"
- start the consumer
I join in its-syncrepl-loop-moddn.tar.bz2 :
- slapd.conf of provider and consummer
- log files of provider and consummer
- mydomain.ldif and add.ldif
Thanks for the detailed report. The bug is confirmed, and it's not related to back-hdb, but seems to be syncrepl-related in general.
It's not clear to me where the issue is. What is the "right" sequence the add of the new superior and the mordrdn should be transmitted? Should the provider operate differently, or should the consumer check all syncrepl messages and try to rebuild the final state, instead of giving up when the internal lookup for the newsuperior fails? Probably, a workaround could be to perform the modrdn by crating the new superior as a glue object, which eventually will be replaced by the actual add.
p.