(ITS#6476) Failure to delete entry with multi-master replication - openldap-bugs

17 Feb 2010


      Full_Name: Craig Worgan
Version: 2.4.21
OS: RedHat Enterprise Linux 5
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (47.11.182.86)
As per Kyle Blaney's post to the bugs mailing list (Kyle is n vacation so I am
submitting this ITS on his behalf):
I have encountered a situation with multi-master replication in OpenLDAP
2.4.21 where an entry deleted on one server is not deleted from its
peer.  I'm using Redhat Enterprise Linux 5.
Here's what I did:
1. Configure Network Time Protocol with server A as the NTP master and
server B as the NTP slave.
2. Configure multi-master replication between server A (server ID=1) and
server B (server ID=2).
3. Start OpenLDAP service on servers A and B.
4. Add an entry to server A and ensure it's replicated to server B.
5. Add an entry to server B and ensure it's replicated to server A.
6. Stop OpenLDAP service on server A.
7. Delete an entry on server B.
8. Start OpenLDAP service on server A with sync debugging enabled (-d
sync).
At this point, I expected that the entry deleted from server B would be
deleted from server A.  Instead, the entry remained on server A and
slapd displayed the following (with the entry's DN X'ed out):
slapd starting
do_syncrep2: rid=001 LDAP_RES_INTERMEDIATE - REFRESH_DELETE
Entry XXXXXX CSN 20100209193028.621799Z#000000#001#000000 older or equal
to ctx 20100209193028.621799Z#000000#001#000000
syncprov_search_response:
cookie=rid=001,sid=001,csn=20100202210831.101462Z#000000#000#000000;2010
0209193028.621799Z#000000#001#000000;20100209193118.342038Z#000000#002#0
00000
Why wouldn't the entry deleted on server B also deleted from server A?
Is the failure to delete the entry related to the "entry CSN older or
equal to context CSN" message?
Unfortunately, I have been unable to reproduce the failure since I first
saw it.  All subsequent tests have shown that the entry deleted from
server B is deleted from server A when the OpenLDAP service on server A
is restarted.
Kyle Blaney
-------------------------------------------------------------------------
There is also this follow up post:
I can now reliably reproduce this problem in 2.4.21 and 2.4.20 on a
multi-master setup that only has five entries:
1. Stop service on server A.
2. Delete one entry on server B.
3. Start service on server A.
After step 3, the entry is never deleted from server A.
I have changed many aspects of configuration (replication with and
without TLS, syncprov-sessionlog enabled and disabled,
syncprov-checkpoint enabled and disabled, syncprov-nopresent TRUE and
FALSE, syncprov-reloadhint TRUE and FALSE) and the problem still occurs.
Is this problem the same one outlined in "test 058 failure" at
http://www.openldap.org/lists/openldap-software/201002/msg00031.html?
Kyle
 ----------------------------------------------------------------------
Thanks,
Craig Worgan