Thank you both Howard and Leonid.
Yes, you're right, it happened the other way around; the modification was made on the second server and propagated back to the first one. However, I don't know why the change was in turn returned to the second server - SIDs are ok as far as I know.
However, as Leonid mentioned, both servers weren't synchronized correctly anyway. Turns out that yesterday we upgraded to 2.4.39-6 (as I stated in my first mail). Previously, we were using 2.4.39-3 and it seemed to work fine. We also noted that 2.4.39-6 produced some additional issues (like client and syncrepl sockets dying without any apparent reason), so today we downgraded back to 2.4.39-3 and everything seems to work just fine again.
We had a look at the changelog from 2.4.39-3 to 2.4.39-6 and no change seems to be explicitly syncrepl related, but rather related to LDAPS (strange, as we use the LDAP protocol for syncrepl instead of LDAPS). Anyway, we'll keep version 2.4.39-3 as far as it works well.
Thanks.
Regards.
2015-04-21 23:53 GMT+01:00 Леонид Юрьев leo@yuriev.ru:
Hi Nicolás,
- If contextCSN(s) are differs on servers, then are still not
syncronized (or has a glitches). http://www.openldap.org/lists/openldap-technical/201108/threads.html#00001
- Replication takes a some time. Therefore contextCSN(s) may be equals
only when some time was no any changes.
- Make sure that the time is synchronized on servers (e.g. by using
ntpdate).
- Unfortunatelly, all current releases (include 2.4.39 and 2.4.40) have
enough bugs in replication code. For example, by ITS#8081 ( http://www.openldap.org/its/index.cgi/Software%20Bugs?id=8081) you could get segfault, but also lost (like undo) some changes by a replication.
- We made a fork of OpenLDAP project for our usecase (highload
TELCO-aware multi-master), it called ReOpenLDAP. If you decide to build slapd from sources, I recommend use our ReOpenLDAP ;)
New features yet not documented in english man-pages, by you can translate by Google: https://github.com/ReOpen/ReOpenLDAP/releases/tag/ReOpenLDAP-2.4.41-rc
https://github.com/ReOpen/ReOpenLDAP/commit/4fc4bc18dd4bd80909aa80700c5c19b0...
https://github.com/ReOpen/ReOpenLDAP/commit/95808b156ee36a886523b7096a75d509...
https://github.com/ReOpen/ReOpenLDAP/commit/1c94bc17ec285388e8a8299399ed5377...
Leonid.
2015-04-21 16:01 GMT+03:00 Nicolás Kovac Neumann nkovacne@ull.edu.es:
Hi,
We're currently using N-way multimaster replication on two servers for our LDAP directory, both for the config and the hdb databases. It's working fine mostly, but we've run into a possible issue with the syncrepl engine which we would like to cast light on. We're using CentOS 7 with openldap-servers version 2.4.39-6.
We made an update on one of the entries (server1, in this case), so server2 replicated that change (as shown below in the log):
==> server1/ldap.log <== Apr 21 13:38:55 server1 slapd[1835]: do_syncrep2: rid=002
cookie=rid=002,sid=002,csn=20150421123855.643239Z#000000#002#000000 Apr 21 13:38:55 server1 slapd[1835]: syncrepl_message_to_entry: rid=002 DN: uid=user1,cn=subtree,dc=example,dc=org, UUID: 18a2436c-73ce-1030-95dd-b52dc05ced13 Apr 21 13:38:55 server1 slapd[1835]: syncrepl_entry: rid=002 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_MODIFY) Apr 21 13:38:55 server1 slapd[1835]: syncrepl_entry: rid=002 be_search (0) Apr 21 13:38:55 server1 slapd[1835]: syncrepl_entry: rid=002 uid=user1,cn=subtree,dc=example,dc=org Apr 21 13:38:55 server1 slapd[1835]: slap_queue_csn: queing 0x7ff8f42789f0 20150421123855.643239Z#000000#002#000000 Apr 21 13:38:55 server1 slapd[1835]: slap_graduate_commit_csn: removing 0x7ff8f435e770 20150421123855.643239Z#000000#002#000000 Apr 21 13:38:55 server1 slapd[1835]: syncrepl_entry: rid=002 be_modify uid=user1,cn=subtree,dc=example,dc=org (0) Apr 21 13:38:55 server1 slapd[1835]: syncprov_sendresp: cookie=rid=001,sid=001,csn=20150421123855.643239Z#000000#002#000000 Apr 21 13:38:55 server1 slapd[1835]: slap_queue_csn: queing 0x7ff8f42789f0 20150421123855.643239Z#000000#002#000000 Apr 21 13:38:55 server1 slapd[1835]: slap_graduate_commit_csn: removing 0x7ff8f41b7b90 20150421123855.643239Z#000000#002#000000
==> server2/ldap.log <== Apr 21 13:38:55 server2 slapd[1948]: slap_queue_csn: queing
0x7f897affb220 20150421123855.643239Z#000000#002#000000 Apr 21 13:38:55 server2 slapd[1948]: syncprov_sendresp: to=001, cookie=rid=002,sid=002,csn=20150421123855.643239Z#000000#002#000000 Apr 21 13:38:55 server2 slapd[1948]: slap_graduate_commit_csn: removing 0x7f89307f42a0 20150421123855.643239Z#000000#002#000000
Nothing strange up to now, however, if we query the contextCSN, it differs on both servers.
For server1, we have:
contextCSN: 20150421123523.281736Z#000000#001#000000 contextCSN: 20150421123417.889477Z#000000#002#000000
For server2, the value for server ID 001 differs:
contextCSN: 20150421115324.003502Z#000000#001#000000 contextCSN: 20150421123417.889477Z#000000#002#000000
However, the entry seems to replicate the entryCSN correctly on both servers:
entryCSN: 20150421123417.889477Z#000000#002#000000
Is this the expected behavior? Shouldn't both contextCSN values match on both servers?
Thanks!
Regards,
Nicolás