Greetings,
At first, I was going to create a bug report, but decided to send to list first. I tried this with both: 2.4.23 (Debian package), and 2.4.25, compiled from source, bdb 4.8.
After a couple of entries just disappeared on one multi-master setup I had, I decided to further investigate, and found this (there are two cases, for the same procedure):
1. Configure two LDAP servers in multi-master setup. 2. Make sure they replicate correctly (off course). 3. Shutdown one of the two ldap servers. 4. Create a new entry (say, ou1) on the LDAP server that is left up. 5. Shutdown the last LDAP server. 6. Start the *other* LDAP server, the one where you didn't create the entry. 7. Create another entry, say: ou2, so that both servers has a new entry, that is *not* on the other server. 8. Shutdown the LDAP server (both servers down now). 9. Start both LDAP servers.
Result (case 1): one of the two newly created entries is missing on *one* of the servers, and only one of the entries is missing on the other server.
Result (case 2): one entry is missing on *both* servers.
Both servers has NTP, and has the same timezone (ie, time is synchronized).
I'm *not* replicating cn=config (I shouldn't, because I have different SSL certificates on each server). Now, more details:
slapd with -d 16384 gives me this on the server that misses both entries, on this server I created the entry dn ou=ou2,dc=st-andes,dc=com (and the server decided to delete it!, and, for some reason, it didn't detected the new ou1 entry created on the other server):
http://www.st-andes.com/openldap/case1/log-server2-case1.txt
The other server (the one that kept one entry and lost the other), on this server I created the entry ou=ou1,dc=st-andes,dc=com, and it says it was changed by peer.....:
http://www.st-andes.com/openldap/case1/log-server1-case1.txt
Now, I'm seeing here that it is using 000 server id... but on the cn=config.ldif I have:
olcServerID: 1 ldap://ldap.ildetech.com:389/ olcServerID: 2 ldap://ldap2.ildetech.com:389/
And the syncrepl:
olcSyncRepl: rid=001 provider=ldap://ldap.ildetech.com:389 binddn="cn=admin,dc=st-andes,dc=com" bindmethod=simple credentials="secret" searchbase="dc=st-andes,dc=com" type=refreshAndPersist retry="3 5 5 +" timeout=7 starttls=critical olcSyncRepl: rid=002 provider=ldap://ldap2.ildetech.com:389 binddn="cn=admin,dc=st-andes,dc=com" bindmethod=simple credentials="secret" searchbase="dc=st-andes,dc=com" type=refreshAndPersist retry="3 5 5 +" timeout=7 starttls=critical olcMirrorMode: TRUE
And, as you can see on the command line, I have the URL specified on the -h parameter, but it seems to be ignoring it!. Or, should I specify the *whole* urls that I put on the -h parameter? (ldap://ldap2.ildetech.com:389 ldap://127.0.0.1:389/ ldaps:/// ldapi:///)
So, I decided to change the config:
On server 1 (kirara):
olcServerID: 1
and
olcSyncRepl: rid=002 provider=ldap://ldap2.ildetech.com:389 binddn="cn=admin,dc=st-andes,dc=com" bindmethod=simple credentials="secret" searchbase="dc=st-andes,dc=com" type=refreshAndPersist retry="3 5 5 +" timeout=7 starttls=critical olcMirrorMode: TRUE
On server 2 (happy):
olcServerID: 2
and
olcSyncRepl: rid=002 provider=ldap://ldap2.ildetech.com:389 binddn="cn=admin,dc=st-andes,dc=com" bindmethod=simple credentials="secret" searchbase="dc=st-andes,dc=com" type=refreshAndPersist retry="3 5 5 +" timeout=7 starttls=critical olcMirrorMode: TRUE
With this new setup, and following the same procedure, I get one missing entry on *both* servers (at least servers gets to a consistent state), but I still have a missing entry. The logs for this setup:
Server 2 (ID 2, where I created entry: ou2 while the other server was down), this server decided, wrongly, to delete entry ou2:
http://www.st-andes.com/openldap/case2/log-server2-case2.txt
And the other server (where I created ou1):
http://www.st-andes.com/openldap/case2/log-server1-case2.txt
This one never saw the other entry, ou2.
For both cases, the syncprov module was with default configuration:
dn: olcOverlay={0}syncprov objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: {0}syncprov structuralObjectClass: olcSyncProvConfig entryUUID: 24354488-e5bf-102f-9e6a-ad3cba95f7f1 creatorsName: cn=config createTimestamp: 20110318152128Z entryCSN: 20110318152128.935227Z#000000#000#000000 modifiersName: cn=config modifyTimestamp: 20110318152128Z
What do you think?
Thanks in advance!
Ildefonso Camargo