I can now reliably reproduce this problem in 2.4.21 and 2.4.20 on a multi-master setup that only has five entries: 1. Stop service on server A. 2. Delete one entry on server B. 3. Start service on server A.
After step 3, the entry is never deleted from server A.
I have changed many aspects of configuration (replication with and without TLS, syncprov-sessionlog enabled and disabled, syncprov-checkpoint enabled and disabled, syncprov-nopresent TRUE and FALSE, syncprov-reloadhint TRUE and FALSE) and the problem still occurs.
Is this problem the same one outlined in "test 058 failure" at http://www.openldap.org/lists/openldap-software/201002/msg00031.html?
Kyle
-----Original Message----- From: openldap-software-bounces+kblaney=avaya.com@OpenLDAP.org [mailto:openldap-software-bounces+kblaney=avaya.com@OpenLDAP.org] On Behalf Of Blaney, Kyle AVAYA (BVW:9T16) Sent: February 9, 2010 4:46 PM To: openldap-software@openldap.org Subject: Failure to delete entry with multi-master replication
I have encountered a situation with multi-master replication in OpenLDAP 2.4.21 where an entry deleted on one server is not deleted from its peer. I'm using Redhat Enterprise Linux 5.
Here's what I did: 1. Configure Network Time Protocol with server A as the NTP master and server B as the NTP slave. 2. Configure multi-master replication between server A (server ID=1) and server B (server ID=2). 3. Start OpenLDAP service on servers A and B. 4. Add an entry to server A and ensure it's replicated to server B. 5. Add an entry to server B and ensure it's replicated to server A. 6. Stop OpenLDAP service on server A. 7. Delete an entry on server B. 8. Start OpenLDAP service on server A with sync debugging enabled (-d sync).
At this point, I expected that the entry deleted from server B would be deleted from server A. Instead, the entry remained on server A and slapd displayed the following (with the entry's DN X'ed out):
slapd starting do_syncrep2: rid=001 LDAP_RES_INTERMEDIATE - REFRESH_DELETE Entry XXXXXX CSN 20100209193028.621799Z#000000#001#000000 older or equal to ctx 20100209193028.621799Z#000000#001#000000 syncprov_search_response: cookie=rid=001,sid=001,csn=20100202210831.101462Z#000000#000#000000;2010 0209193028.621799Z#000000#001#000000;20100209193118.342038Z#000000#002#0 00000
Why wouldn't the entry deleted on server B also deleted from server A? Is the failure to delete the entry related to the "entry CSN older or equal to context CSN" message?
Unfortunately, I have been unable to reproduce the failure since I first saw it. All subsequent tests have shown that the entry deleted from server B is deleted from server A when the OpenLDAP service on server A is restarted.
Kyle Blaney