Delta-sync replication: is it possible to force resync delta?

List overview All Threads
Download

newer

older

how to set LDAP ACL permissions on...

disable SSL/TLS renegotiation

skeletor

19 Jan 2022 19 Jan '22

6:26 a.m.

Hi. I use delta-sync replication on version 2.4. Sometimes, some records don't send to slave. Insofar as this is delta-sync after a new update slave receive only last update. Therefore slave and master are not consistent. Is it possible run force resync from accesslog to consistent check if all records are present on slave? May be this is a bug on version 2.4 and it already has been fixed in newer version?

master 2.4.57 slave 2.4.55

Show replies by date

Ulrich Windl

20 Jan 20 Jan

1:56 a.m.

New subject: Antw: [EXT] Delta-sync replication: is it possible to force resync delta?

...

...
...
skeletor skeletor@lissyara.su schrieb am 19.01.2022 um 15:26 in Nachricht

17e37982-716f-795c-e810-70c483b6d05e@lissyara.su:

...

Hi. I use delta-sync replication on version 2.4. Sometimes, some records don't send to slave. Insofar as this is delta-sync after a new update slave receive only last update. Therefore slave and master are not consistent. Is it possible run force resync from accesslog to consistent check if all records are present on slave? May be this is a bug on version 2.4 and it already has been fixed in newer version?

master 2.4.57 slave 2.4.55

The answer you are going to hear will most likeley be this: 2.4 is obsolete and no longer supported. Maybe if you got the software from your OS (e.g. SLES up to 12) and you have some maintenance contract, maybe they can help you.

Regards, Ulrich

Quanah Gibson-Mount

9:02 a.m.

--On Wednesday, January 19, 2022 4:26 PM +0200 skeletor skeletor@lissyara.su wrote:

...

Hi. I use delta-sync replication on version 2.4. Sometimes, some records don't send to slave. Insofar as this is delta-sync after a new update slave receive only last update. Therefore slave and master are not consistent. Is it possible run force resync from accesslog to consistent check if all records are present on slave? May be this is a bug on version 2.4 and it already has been fixed in newer version?

master 2.4.57 slave 2.4.55

I'm not quite sure what you mean by sometimes some records don't send to the slave. That would most generally indicate a configuration issue. You would want to see if the record exists in the accesslog DB of the provider corresponding to the time it was added via an ldap operation (obviously, any entries added offline via mechanisms like slapadd would never be replicated). I would also confirm that you do not see REFRESHes occurring on the consumer vs the provider. I would also confirm that you haven't run the accesslog database out of space preventing it from recording operations (the MDB maxsize for the accesslog db).

The only reliable way to get them in sync would be to slapcat the provider and import it on the consumer. However, given the description of the problem from your end, I'm not convinced they wouldn't just de-sync again.

Regards, Quanah

Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com

Ulrich Windl

11:14 p.m.

New subject: Antw: [EXT] Re: Delta‑sync replication: is it possible to force resync delta?

...

...
...
Quanah Gibson-Mount quanah@symas.com schrieb am 20.01.2022 um 18:02 in

Nachricht <65ABF4684C2D1F77600EF736@[192.168.1.27]>:

...

‑‑On Wednesday, January 19, 2022 4:26 PM +0200 skeletor skeletor@lissyara.su wrote:

...
Hi. I use delta‑sync replication on version 2.4. Sometimes, some records don't send to slave. Insofar as this is delta‑sync after a new update slave receive only last update. Therefore slave and master are not consistent. Is it possible run force resync from accesslog to consistent check if all records are present on slave? May be this is a bug on version 2.4 and it already has been fixed in newer version?

master 2.4.57 slave 2.4.55

I'm not quite sure what you mean by sometimes some records don't send to the slave. That would most generally indicate a configuration issue. You would want to see if the record exists in the accesslog DB of the provider corresponding to the time it was added via an ldap operation (obviously, any entries added offline via mechanisms like slapadd would never be replicated). I would also confirm that you do not see REFRESHes occurring on the consumer vs the provider. I would also confirm that you haven't run

...

the accesslog database out of space preventing it from recording operations

...

(the MDB maxsize for the accesslog db).

The only reliable way to get them in sync would be to slapcat the provider and import it on the consumer. However, given the description of the problem from your end, I'm not convinced they wouldn't just de‑sync again.

Independent of this problem I wonder whether it is possible (proper access rights assumed) to compare the contents of all mirroring servers via LDAP efficiently. It seems the seemingly random order of entries and attributes is the major obstacle when searching just "for everything". Also an operation like "resync fully from ServerID X" would seem to be a nice idea.

Regards, Ulrich

Quanah Gibson-Mount

21 Jan 21 Jan

8:23 a.m.

New subject: Antw: [EXT] Re: Delta‑sync replication: is it possible to force resync delta?

--On Friday, January 21, 2022 8:14 AM +0100 Ulrich Windl Ulrich.Windl@rz.uni-regensburg.de wrote:

...

Independent of this problem I wonder whether it is possible (proper access rights assumed) to compare the contents of all mirroring servers via LDAP efficiently.

One issue is that would require putting all systems into read only mode so that no changes occur during whatever process is written to determine consistency. The other issue is the size of the database. Trivial for small databases, not trivial for databases with millions of objects and attributes.

...

It seems the seemingly random order of entries and attributes is the major obstacle when searching just "for everything". Also an operation like "resync fully from ServerID X" would seem to be a nice idea.

See the slapd(8C) man page, -c option.

--Quanah

Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com

Francesco Malvezzi

2 Mar 2 Mar

2:49 a.m.

New subject: slapd(8C) man page, -c option: covers delete phase?

hi all,

on a consumer I spotted a zombie entry which was deleted on provider.

Replication is syncrepl:

olcSyncrepl: {0}rid=003 provider=ldap://ldap-master.example.org binddn="cn=repluser,ou=agents,dc=example,dc=org" bindmethod=simple credentials="secret" searchbase="ou=people,dc=example,dc=org" type=refreshOnly interval=00:00:01:00 retry="5 5 30 +" timeout=1 scope=sub schemachecking=on exattrs=sambaHomeDrive sizelimit=100000 timelimit=7200 starttls=yes filter="....."

my naive strategy was to (re-)start slapd with the -c rid=003 switch but of course I am missing something because I can see from logs syncronization if actually forced but zombie entry is not deleted.

I am really sorry to ask you what I misundestood,

thank you,

Francesco

Michael Ströder

11:29 a.m.

New subject: slapd(8C) man page, -c option: covers delete phase?

On 3/2/22 11:49, Francesco Malvezzi wrote:

...

on a consumer I spotted a zombie entry which was deleted on provider.

Which OpenLDAP version are you using?

...

Replication is syncrepl:

olcSyncrepl: {0}rid=003 provider=ldap://ldap-master.example.org binddn="cn=repluser,ou=agents,dc=example,dc=org" bindmethod=simple credentials="secret" searchbase="ou=people,dc=example,dc=org" type=refreshOnly interval=00:00:01:00 retry="5 5 30 +" timeout=1 scope=sub schemachecking=on exattrs=sambaHomeDrive sizelimit=100000 timelimit=7200 starttls=yes filter="....."

I cannot really tell what's going on in your deployment.

But I wonder why you added sizelimit= to the syncrepl directive. Do you really have less than 100000 entries?

Ciao, Michael.

Francesco Malvezzi

3 Mar 3 Mar

1:58 a.m.

New subject: slapd(8C) man page, -c option: covers delete phase?

On 02/03/22 20:29, Michael Ströder wrote:

...

On 3/2/22 11:49, Francesco Malvezzi wrote:

...
on a consumer I spotted a zombie entry which was deleted on provider.

Which OpenLDAP version are you using?

consumer: openldap-2.5.6 provider: openldap-2.4.56

...

...
Replication is syncrepl:

olcSyncrepl: {0}rid=003 provider=ldap://ldap-master.example.org binddn="cn=repluser,ou=agents,dc=example,dc=org" bindmethod=simple credentials="secret" searchbase="ou=people,dc=example,dc=org" type=refreshOnly interval=00:00:01:00 retry="5 5 30 +" timeout=1 scope=sub schemachecking=on exattrs=sambaHomeDrive sizelimit=100000 timelimit=7200 starttls=yes filter="....."

I cannot really tell what's going on in your deployment.

got it: the procedure is fine but the environment is broken.

I stopped slapd, deleted the mdb files, restarted slapd and in an acceptable time the users have been all re-synced with all zombies dropped. It is not elegant at all, so I need to investigate the deployment.

...

But I wonder why you added sizelimit= to the syncrepl directive. Do you really have less than 100000 entries?

yes, the example.edu userbase is really this small (67k users more or less). Anyhow I removed the sizelimit, even if I think it would hurt me in the other way (banning users from showing up, not from being removed),

...

Ciao, Michael.

thank you so much for your time,

Francesco

Quanah Gibson-Mount

11:35 a.m.

New subject: slapd(8C) man page, -c option: covers delete phase?

--On Thursday, March 3, 2022 10:58 AM +0100 Francesco Malvezzi francesco.malvezzi@unimore.it wrote:

...

I stopped slapd, deleted the mdb files, restarted slapd and in an acceptable time the users have been all re-synced with all zombies dropped. It is not elegant at all, so I need to investigate the deployment.

It would be much faster to export the DB on the provider (slapcat) and the import it on the consumer (slapadd -q) and guarantee correctness, especially with the known issues in the OpenLDAP 2.4 replication code.

...

...
But I wonder why you added sizelimit= to the syncrepl directive. Do you really have less than 100000 entries?

yes, the example.edu userbase is really this small (67k users more or less). Anyhow I removed the sizelimit, even if I think it would hurt me in the other way (banning users from showing up, not from being removed),

The replication process should not be subject to size limits.

Regards, Quanah

skeletor

20 Jan 20 Jan

11:47 p.m.

20.01.2022 19:02, Quanah Gibson-Mount пишет:

...

--On Wednesday, January 19, 2022 4:26 PM +0200 skeletor skeletor@lissyara.su wrote:

...
Hi. I use delta-sync replication on version 2.4. Sometimes, some records don't send to slave. Insofar as this is delta-sync after a new update slave receive only last update. Therefore slave and master are not consistent. Is it possible run force resync from accesslog to consistent check if all records are present on slave? May be this is a bug on version 2.4 and it already has been fixed in newer version?

master 2.4.57 slave 2.4.55

I'm not quite sure what you mean by sometimes some records don't send to the slave. That would most generally indicate a configuration issue. You would want to see if the record exists in the accesslog DB of the provider corresponding to the time it was added via an ldap operation (obviously, any entries added offline via mechanisms like slapadd would never be replicated). I would also confirm that you do not see REFRESHes occurring on the consumer vs the provider. I would also confirm that you haven't run the accesslog database out of space preventing it from recording operations (the MDB maxsize for the accesslog db).

I mean that records are present on master, present at accesslog but not present on slave. Below, what exactly I mean:

- master reqMod: entryCSN:+ 20220119115618.595929Z#000000#000#000000 reqMod: entryCSN:+ 20220119134148.182859Z#000000#000#000000 reqMod: entryCSN:+ 20220119135935.992674Z#000000#000#000000 reqMod: entryCSN:+ 20220119140414.357271Z#000000#000#000000

- accesslog reqMod: entryCSN:+ 20220119115618.595929Z#000000#000#000000 reqMod: entryCSN:+ 20220119134148.182859Z#000000#000#000000 reqMod: entryCSN:+ 20220119135935.992674Z#000000#000#000000 reqMod: entryCSN:+ 20220119140414.357271Z#000000#000#000000

- slave reqMod: entryCSN:+ 20220119115618.595929Z#000000#000#000000 reqMod: entryCSN:+ 20220119140414.357271Z#000000#000#000000

Slave doesn't have 2 middle transactions.

It's a development evronment with a several (about 10-15 of all) records, so, the MDB maxsize wasn't reached.

In this case I simulated different problems which can cause in production: network errors between master/slave, restart master, restart slave. So, i did next

- restarted master and after added a data - restarted slave and after added a data - was closed and after added and after opened connections by firewall

All of data were added when master was online, because I always use a command:

openldapadd -H ldap://127.0.0.1 -W -D cn=admin,ou=admin,dc=domain,dc=com -f user.ldif.

Otherwise, I couldn't added data because it couldn't connect via host 127.0.0.1, because server was down. Are you agree with me?

Quanah Gibson-Mount

31 Jan 31 Jan

7:45 p.m.

--On Friday, January 21, 2022 9:47 AM +0200 skeletor skeletor@lissyara.su wrote:

...

I mean that records are present on master, present at accesslog but not present on slave. Below, what exactly I mean:

master

reqMod: entryCSN:+ 20220119115618.595929Z#000000#000#000000 reqMod: entryCSN:+ 20220119134148.182859Z#000000#000#000000 reqMod: entryCSN:+ 20220119135935.992674Z#000000#000#000000 reqMod: entryCSN:+ 20220119140414.357271Z#000000#000#000000

accesslog

reqMod: entryCSN:+ 20220119115618.595929Z#000000#000#000000 reqMod: entryCSN:+ 20220119134148.182859Z#000000#000#000000 reqMod: entryCSN:+ 20220119135935.992674Z#000000#000#000000 reqMod: entryCSN:+ 20220119140414.357271Z#000000#000#000000

slave

reqMod: entryCSN:+ 20220119115618.595929Z#000000#000#000000 reqMod: entryCSN:+ 20220119140414.357271Z#000000#000#000000

Slave doesn't have 2 middle transactions.

It's a development evronment with a several (about 10-15 of all) records, so, the MDB maxsize wasn't reached.

In this case I simulated different problems which can cause in production: network errors between master/slave, restart master, restart slave. So, i did next

restarted master and after added a data

restarted slave and after added a data

was closed and after added and after opened connections by firewall

You first would need to examine the logs with "stats sync" both enabled and see what happened when that particular contextCSN was replicated. Perhaps you have a configuration error or other problem. Without log information it's impossible to tell what happened in your case.

--Quanah

1348

Age (days ago)

1391

Last active (days ago)

openldap-technical@openldap.org

10 comments

6 participants

tags (0)

participants (6)

Francesco Malvezzi
Michael Ströder
Quanah Gibson-Mount
Quanah Gibson-Mount
skeletor
Ulrich Windl