Hi. I use delta-sync replication on version 2.4. Sometimes, some records don't send to slave. Insofar as this is delta-sync after a new update slave receive only last update. Therefore slave and master are not consistent. Is it possible run force resync from accesslog to consistent check if all records are present on slave? May be this is a bug on version 2.4 and it already has been fixed in newer version?
master 2.4.57 slave 2.4.55
skeletor skeletor@lissyara.su schrieb am 19.01.2022 um 15:26 in Nachricht
17e37982-716f-795c-e810-70c483b6d05e@lissyara.su:
Hi. I use delta-sync replication on version 2.4. Sometimes, some records don't send to slave. Insofar as this is delta-sync after a new update slave receive only last update. Therefore slave and master are not consistent. Is it possible run force resync from accesslog to consistent check if all records are present on slave? May be this is a bug on version 2.4 and it already has been fixed in newer version?
master 2.4.57 slave 2.4.55
The answer you are going to hear will most likeley be this: 2.4 is obsolete and no longer supported. Maybe if you got the software from your OS (e.g. SLES up to 12) and you have some maintenance contract, maybe they can help you.
Regards, Ulrich
--On Wednesday, January 19, 2022 4:26 PM +0200 skeletor skeletor@lissyara.su wrote:
Hi. I use delta-sync replication on version 2.4. Sometimes, some records don't send to slave. Insofar as this is delta-sync after a new update slave receive only last update. Therefore slave and master are not consistent. Is it possible run force resync from accesslog to consistent check if all records are present on slave? May be this is a bug on version 2.4 and it already has been fixed in newer version?
master 2.4.57 slave 2.4.55
I'm not quite sure what you mean by sometimes some records don't send to the slave. That would most generally indicate a configuration issue. You would want to see if the record exists in the accesslog DB of the provider corresponding to the time it was added via an ldap operation (obviously, any entries added offline via mechanisms like slapadd would never be replicated). I would also confirm that you do not see REFRESHes occurring on the consumer vs the provider. I would also confirm that you haven't run the accesslog database out of space preventing it from recording operations (the MDB maxsize for the accesslog db).
The only reliable way to get them in sync would be to slapcat the provider and import it on the consumer. However, given the description of the problem from your end, I'm not convinced they wouldn't just de-sync again.
Regards, Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
Quanah Gibson-Mount quanah@symas.com schrieb am 20.01.2022 um 18:02 in
Nachricht <65ABF4684C2D1F77600EF736@[192.168.1.27]>:
‑‑On Wednesday, January 19, 2022 4:26 PM +0200 skeletor skeletor@lissyara.su wrote:
Hi. I use delta‑sync replication on version 2.4. Sometimes, some records don't send to slave. Insofar as this is delta‑sync after a new update slave receive only last update. Therefore slave and master are not consistent. Is it possible run force resync from accesslog to consistent check if all records are present on slave? May be this is a bug on version 2.4 and it already has been fixed in newer version?
master 2.4.57 slave 2.4.55
I'm not quite sure what you mean by sometimes some records don't send to the slave. That would most generally indicate a configuration issue. You would want to see if the record exists in the accesslog DB of the provider corresponding to the time it was added via an ldap operation (obviously, any entries added offline via mechanisms like slapadd would never be replicated). I would also confirm that you do not see REFRESHes occurring on the consumer vs the provider. I would also confirm that you haven't run
the accesslog database out of space preventing it from recording operations
(the MDB maxsize for the accesslog db).
The only reliable way to get them in sync would be to slapcat the provider and import it on the consumer. However, given the description of the problem from your end, I'm not convinced they wouldn't just de‑sync again.
Independent of this problem I wonder whether it is possible (proper access rights assumed) to compare the contents of all mirroring servers via LDAP efficiently. It seems the seemingly random order of entries and attributes is the major obstacle when searching just "for everything". Also an operation like "resync fully from ServerID X" would seem to be a nice idea.
Regards, Ulrich
--On Friday, January 21, 2022 8:14 AM +0100 Ulrich Windl Ulrich.Windl@rz.uni-regensburg.de wrote:
Independent of this problem I wonder whether it is possible (proper access rights assumed) to compare the contents of all mirroring servers via LDAP efficiently.
One issue is that would require putting all systems into read only mode so that no changes occur during whatever process is written to determine consistency. The other issue is the size of the database. Trivial for small databases, not trivial for databases with millions of objects and attributes.
It seems the seemingly random order of entries and attributes is the major obstacle when searching just "for everything". Also an operation like "resync fully from ServerID X" would seem to be a nice idea.
See the slapd(8C) man page, -c option.
--Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
hi all,
on a consumer I spotted a zombie entry which was deleted on provider.
Replication is syncrepl:
olcSyncrepl: {0}rid=003 provider=ldap://ldap-master.example.org binddn="cn=repluser,ou=agents,dc=example,dc=org" bindmethod=simple credentials="secret" searchbase="ou=people,dc=example,dc=org" type=refreshOnly interval=00:00:01:00 retry="5 5 30 +" timeout=1 scope=sub schemachecking=on exattrs=sambaHomeDrive sizelimit=100000 timelimit=7200 starttls=yes filter="....."
my naive strategy was to (re-)start slapd with the -c rid=003 switch but of course I am missing something because I can see from logs syncronization if actually forced but zombie entry is not deleted.
I am really sorry to ask you what I misundestood,
thank you,
Francesco
On 3/2/22 11:49, Francesco Malvezzi wrote:
on a consumer I spotted a zombie entry which was deleted on provider.
Which OpenLDAP version are you using?
Replication is syncrepl:
olcSyncrepl: {0}rid=003 provider=ldap://ldap-master.example.org binddn="cn=repluser,ou=agents,dc=example,dc=org" bindmethod=simple credentials="secret" searchbase="ou=people,dc=example,dc=org" type=refreshOnly interval=00:00:01:00 retry="5 5 30 +" timeout=1 scope=sub schemachecking=on exattrs=sambaHomeDrive sizelimit=100000 timelimit=7200 starttls=yes filter="....."
I cannot really tell what's going on in your deployment.
But I wonder why you added sizelimit= to the syncrepl directive. Do you really have less than 100000 entries?
Ciao, Michael.
On 02/03/22 20:29, Michael Ströder wrote:
On 3/2/22 11:49, Francesco Malvezzi wrote:
on a consumer I spotted a zombie entry which was deleted on provider.
Which OpenLDAP version are you using?
consumer: openldap-2.5.6 provider: openldap-2.4.56
Replication is syncrepl:
olcSyncrepl: {0}rid=003 provider=ldap://ldap-master.example.org binddn="cn=repluser,ou=agents,dc=example,dc=org" bindmethod=simple credentials="secret" searchbase="ou=people,dc=example,dc=org" type=refreshOnly interval=00:00:01:00 retry="5 5 30 +" timeout=1 scope=sub schemachecking=on exattrs=sambaHomeDrive sizelimit=100000 timelimit=7200 starttls=yes filter="....."
I cannot really tell what's going on in your deployment.
got it: the procedure is fine but the environment is broken.
I stopped slapd, deleted the mdb files, restarted slapd and in an acceptable time the users have been all re-synced with all zombies dropped. It is not elegant at all, so I need to investigate the deployment.
But I wonder why you added sizelimit= to the syncrepl directive. Do you really have less than 100000 entries?
yes, the example.edu userbase is really this small (67k users more or less). Anyhow I removed the sizelimit, even if I think it would hurt me in the other way (banning users from showing up, not from being removed),
Ciao, Michael.
thank you so much for your time,
Francesco
--On Thursday, March 3, 2022 10:58 AM +0100 Francesco Malvezzi francesco.malvezzi@unimore.it wrote:
I stopped slapd, deleted the mdb files, restarted slapd and in an acceptable time the users have been all re-synced with all zombies dropped. It is not elegant at all, so I need to investigate the deployment.
It would be much faster to export the DB on the provider (slapcat) and the import it on the consumer (slapadd -q) and guarantee correctness, especially with the known issues in the OpenLDAP 2.4 replication code.
But I wonder why you added sizelimit= to the syncrepl directive. Do you really have less than 100000 entries?
yes, the example.edu userbase is really this small (67k users more or less). Anyhow I removed the sizelimit, even if I think it would hurt me in the other way (banning users from showing up, not from being removed),
The replication process should not be subject to size limits.
Regards, Quanah
20.01.2022 19:02, Quanah Gibson-Mount пишет:
--On Wednesday, January 19, 2022 4:26 PM +0200 skeletor skeletor@lissyara.su wrote:
Hi. I use delta-sync replication on version 2.4. Sometimes, some records don't send to slave. Insofar as this is delta-sync after a new update slave receive only last update. Therefore slave and master are not consistent. Is it possible run force resync from accesslog to consistent check if all records are present on slave? May be this is a bug on version 2.4 and it already has been fixed in newer version?
master 2.4.57 slave 2.4.55
I'm not quite sure what you mean by sometimes some records don't send to the slave. That would most generally indicate a configuration issue. You would want to see if the record exists in the accesslog DB of the provider corresponding to the time it was added via an ldap operation (obviously, any entries added offline via mechanisms like slapadd would never be replicated). I would also confirm that you do not see REFRESHes occurring on the consumer vs the provider. I would also confirm that you haven't run the accesslog database out of space preventing it from recording operations (the MDB maxsize for the accesslog db).
I mean that records are present on master, present at accesslog but not present on slave. Below, what exactly I mean:
- master reqMod: entryCSN:+ 20220119115618.595929Z#000000#000#000000 reqMod: entryCSN:+ 20220119134148.182859Z#000000#000#000000 reqMod: entryCSN:+ 20220119135935.992674Z#000000#000#000000 reqMod: entryCSN:+ 20220119140414.357271Z#000000#000#000000
- accesslog reqMod: entryCSN:+ 20220119115618.595929Z#000000#000#000000 reqMod: entryCSN:+ 20220119134148.182859Z#000000#000#000000 reqMod: entryCSN:+ 20220119135935.992674Z#000000#000#000000 reqMod: entryCSN:+ 20220119140414.357271Z#000000#000#000000
- slave reqMod: entryCSN:+ 20220119115618.595929Z#000000#000#000000 reqMod: entryCSN:+ 20220119140414.357271Z#000000#000#000000
Slave doesn't have 2 middle transactions.
It's a development evronment with a several (about 10-15 of all) records, so, the MDB maxsize wasn't reached.
In this case I simulated different problems which can cause in production: network errors between master/slave, restart master, restart slave. So, i did next
- restarted master and after added a data - restarted slave and after added a data - was closed and after added and after opened connections by firewall
All of data were added when master was online, because I always use a command:
openldapadd -H ldap://127.0.0.1 -W -D cn=admin,ou=admin,dc=domain,dc=com -f user.ldif.
Otherwise, I couldn't added data because it couldn't connect via host 127.0.0.1, because server was down. Are you agree with me?
--On Friday, January 21, 2022 9:47 AM +0200 skeletor skeletor@lissyara.su wrote:
I mean that records are present on master, present at accesslog but not present on slave. Below, what exactly I mean:
- master
reqMod: entryCSN:+ 20220119115618.595929Z#000000#000#000000 reqMod: entryCSN:+ 20220119134148.182859Z#000000#000#000000 reqMod: entryCSN:+ 20220119135935.992674Z#000000#000#000000 reqMod: entryCSN:+ 20220119140414.357271Z#000000#000#000000
- accesslog
reqMod: entryCSN:+ 20220119115618.595929Z#000000#000#000000 reqMod: entryCSN:+ 20220119134148.182859Z#000000#000#000000 reqMod: entryCSN:+ 20220119135935.992674Z#000000#000#000000 reqMod: entryCSN:+ 20220119140414.357271Z#000000#000#000000
- slave
reqMod: entryCSN:+ 20220119115618.595929Z#000000#000#000000 reqMod: entryCSN:+ 20220119140414.357271Z#000000#000#000000
Slave doesn't have 2 middle transactions.
It's a development evronment with a several (about 10-15 of all) records, so, the MDB maxsize wasn't reached.
In this case I simulated different problems which can cause in production: network errors between master/slave, restart master, restart slave. So, i did next
- restarted master and after added a data
- restarted slave and after added a data
- was closed and after added and after opened connections by firewall
You first would need to examine the logs with "stats sync" both enabled and see what happened when that particular contextCSN was replicated. Perhaps you have a configuration error or other problem. Without log information it's impossible to tell what happened in your case.
--Quanah
openldap-technical@openldap.org