Full_Name: Craig Worgan Version: 2.4.21 OS: RedHat Enterprise Linux 5 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (47.11.182.86)
As per Kyle Blaney's post to the bugs mailing list (Kyle is n vacation so I am submitting this ITS on his behalf):
I have encountered a situation with multi-master replication in OpenLDAP 2.4.21 where an entry deleted on one server is not deleted from its peer. I'm using Redhat Enterprise Linux 5.
Here's what I did: 1. Configure Network Time Protocol with server A as the NTP master and server B as the NTP slave. 2. Configure multi-master replication between server A (server ID=1) and server B (server ID=2). 3. Start OpenLDAP service on servers A and B. 4. Add an entry to server A and ensure it's replicated to server B. 5. Add an entry to server B and ensure it's replicated to server A. 6. Stop OpenLDAP service on server A. 7. Delete an entry on server B. 8. Start OpenLDAP service on server A with sync debugging enabled (-d sync).
At this point, I expected that the entry deleted from server B would be deleted from server A. Instead, the entry remained on server A and slapd displayed the following (with the entry's DN X'ed out):
slapd starting do_syncrep2: rid=001 LDAP_RES_INTERMEDIATE - REFRESH_DELETE Entry XXXXXX CSN 20100209193028.621799Z#000000#001#000000 older or equal to ctx 20100209193028.621799Z#000000#001#000000 syncprov_search_response: cookie=rid=001,sid=001,csn=20100202210831.101462Z#000000#000#000000;2010 0209193028.621799Z#000000#001#000000;20100209193118.342038Z#000000#002#0 00000
Why wouldn't the entry deleted on server B also deleted from server A? Is the failure to delete the entry related to the "entry CSN older or equal to context CSN" message?
Unfortunately, I have been unable to reproduce the failure since I first saw it. All subsequent tests have shown that the entry deleted from server B is deleted from server A when the OpenLDAP service on server A is restarted.
Kyle Blaney
-------------------------------------------------------------------------
There is also this follow up post: I can now reliably reproduce this problem in 2.4.21 and 2.4.20 on a multi-master setup that only has five entries: 1. Stop service on server A. 2. Delete one entry on server B. 3. Start service on server A.
After step 3, the entry is never deleted from server A.
I have changed many aspects of configuration (replication with and without TLS, syncprov-sessionlog enabled and disabled, syncprov-checkpoint enabled and disabled, syncprov-nopresent TRUE and FALSE, syncprov-reloadhint TRUE and FALSE) and the problem still occurs.
Is this problem the same one outlined in "test 058 failure" at http://www.openldap.org/lists/openldap-software/201002/msg00031.html?
Kyle ----------------------------------------------------------------------
Thanks,
Craig Worgan