SyncProv checkpointing

List overview All Threads
Download

newer

older

idletimeout setting is not working

OpenLDAP 2.5.4 WARNING: Could not...

thomaswilliampritchard＠gmail.com

3 May 2021 3 May '21

2:52 p.m.

Hi,

Through testing we have discovered restoring from backup is most accurate when we have the syncprov checkpointing at "1 1". Or checkpoint after 1 operation or 1 minute (olcSpCheckpoint: 1 1).

Are there any concerns with having this frequent of checkpointing?

Thanks, Thomas

Show replies by date

Quanah Gibson-Mount

3 May 3 May

3:47 p.m.

--On Monday, May 3, 2021 10:52 PM +0000 thomaswilliampritchard@gmail.com wrote:

...

Hi,

Through testing we have discovered restoring from backup is most accurate when we have the syncprov checkpointing at "1 1". Or checkpoint after 1 operation or 1 minute (olcSpCheckpoint: 1 1).

Are there any concerns with having this frequent of checkpointing?

Too little information here.

What OpenLDAP release are you on? Do you use standard syncrepl or delta-syncrepl? What is your restore process?

--Quanah

Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com

thomaswilliampritchard＠gmail.com

4 May 4 May

8:27 a.m.

Thanks for the follow up Quanah,

Version: OpenLDAP: slapd 2.4.56 We use delta sync replication We have been provisioning 2 new providers to achieve multi master for a total of 3 providers. We have been testing the newly provisioned databases for accuracy with the ldiff-diff tool provided by https://github.com/pingidentity/ldapsdk

Provision Process: 1. Take backup of database with mdb_copy on initial provider. 2. Load database schema / config on new provider. 3. replace default DB with backup DB on new provider 4. Add access log overlay / access log database to DB on new provider 5. Turn on delta sync replication for new provider so it "catches up" from the original provider.

We historically did not have olcSpCheckpoint set to a value so we theorize the backup databases were newer in state than indicated by the contextCSN in the backed up database. Through testing specifically around deleting a user group containing hundreds of users between when the backup was taken and the new provider enabled w/ replication (so the new provider had to catch up with the user group delete) we noticed incorrect final state on the newly provisioned provider. Not until we added CSN checkpointing did the restores start to be 100% accurate. Given our theory that the newly provisioned database is syncing from the last checkpointed CSN and forming inconsistencies we wanted to set olcSpCheckpoint: 1 1, to never have a discrepancy between database state and the CSN in the backed up database.

Initial testing shows `olcSpCheckpoint: 1 1` to function fine, but we wanted to be cautious about setting to this frequency in case there were any known issues doing such a frequent setting.

Thanks, Tom

Quanah Gibson-Mount

9:37 a.m.

--On Tuesday, May 4, 2021 4:27 PM +0000 thomaswilliampritchard@gmail.com wrote:

...

Provision Process:

Take backup of database with mdb_copy on initial provider.

Is slapd stopped when you run mdb_copy, or running?

Regards, Quanah

Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com

thomaswilliampritchard＠gmail.com

11:13 a.m.

It is always running when we take backups via mdb_copy.

Quanah Gibson-Mount

12:13 p.m.

--On Tuesday, May 4, 2021 7:13 PM +0000 thomaswilliampritchard@gmail.com wrote:

...

It is always running when we take backups via mdb_copy.

Then it's expected the contextCSN won't be perfectly in sync at that point. I assume you're only doing an mdb copy of the primary db, and not the accesslog DB, since the accesslog DB is serverID specific.

It's not clear to me what you mean by "most accurate". If the checkpoint is behind, the system should still be able to quickly regain its state since it will simply replay the accesslog on the other provider until it's current. I.e., as long as a reasonable checkpoint is already in place, it will never get particularly behind.

I would generally advise upgrading to a current build, however, to pull in the more recent replication related fixes, particularly:

OpenLDAP 2.4.57 Release (2021/01/18) Fixed slapo-syncprov to ignore duplicate sessionlog entries (ITS#9394)

OpenLDAP 2.4.58 Release (2021/03/16) Fixed slapd syncrepl to check all contextCSNs (ITS#9282)

Regards, Quanah

Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com

thomaswilliampritchard＠gmail.com

3:37 p.m.

You are correct we do not copy the access log, strictly the primary db.

When we restore a backup with a behind checkpoint we find some entries have incorrect fields in the new provider given the current state of the original provider, in other words, the databases do not match. The new provider seems to regain an incorrect state when syncing with a behind checkpoint from the current DB state.

On Provider A (missing or large olcSpCheckpoint interval possibly days old). Add group1 with a set of 100 users. Add the 100 users to a new group, group2. Take a backup with mdb_copy. Delete group2.

On Provider B Build / setup with the backup mdb_copy database. Turn on delta sync to Provider A

When the catch up sync is finished, compare the database contents for accuracy. We are seeing group membership become incorrect on Provider B (the new provider).

We cannot upgrade at the moment and olcSpCheckpoint: 1 1 seems to work. Is there any reason we should not use olcSpCheckpoint: 1 1?

Quanah Gibson-Mount

7:50 p.m.

--On Tuesday, May 4, 2021 11:37 PM +0000 thomaswilliampritchard@gmail.com wrote:

...

You are correct we do not copy the access log, strictly the primary db.

Ok good.

...

When we restore a backup with a behind checkpoint we find some entries have incorrect fields in the new provider given the current state of the original provider, in other words, the databases do not match. The new provider seems to regain an incorrect state when syncing with a behind checkpoint from the current DB state.

On Provider A (missing or large olcSpCheckpoint interval possibly days old). Add group1 with a set of 100 users. Add the 100 users to a new group, group2. Take a backup with mdb_copy. Delete group2.

On Provider B Build / setup with the backup mdb_copy database. Turn on delta sync to Provider A

When the catch up sync is finished, compare the database contents for accuracy. We are seeing group membership become incorrect on Provider B (the new provider).

We cannot upgrade at the moment and olcSpCheckpoint: 1 1 seems to work. Is there any reason we should not use olcSpCheckpoint: 1 1?

No, that's fine. The issue is more that you shouldn't be having any issues as long as the checkpoint is more frequent than the accesslog purge configuration. It would be useful to have a copy of your configuration for the two nodes (passwords redacted, if you can send them to me directly). I'd like to see if I can create a reproduction case.

Regards, Quanah

Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com

Ulrich Windl

11:37 p.m.

New subject: Antw: [EXT] Re: SyncProv checkpointing

...

...
...
thomaswilliampritchard@gmail.com schrieb am 04.05.2021 um 17:27 in Nachricht

20210504152711.5262.64794@hypatia.openldap.org:

...

Thanks for the follow up Quanah,

Version: OpenLDAP: slapd 2.4.56 We use delta sync replication We have been provisioning 2 new providers to achieve multi master for a total of 3 providers. We have been testing the newly provisioned databases for accuracy with the ldiff-diff tool provided by https://github.com/pingidentity/ldapsdk

Provision Process:

Take backup of database with mdb_copy on initial provider.

Load database schema / config on new provider.

replace default DB with backup DB on new provider

Add access log overlay / access log database to DB on new provider

Turn on delta sync replication for new provider so it "catches up" from

the original provider.

We historically did not have olcSpCheckpoint set to a value so we theorize the backup databases were newer in state than indicated by the contextCSN in the backed up database. Through testing specifically around deleting a user group containing hundreds of users between when the backup was taken and the new provider enabled w/ replication (so the new provider had to catch up with the user group delete) we noticed incorrect final state on the newly provisioned provider. Not until we added CSN checkpointing did the restores start to be 100% accurate. Given our theory that the newly provisioned database is syncing from the last checkpointed CSN and forming inconsistencies we wanted to set olcSpCheckpoint: 1 1, to never have a discrepancy between database state and the CSN in the backed up database.

I just wonder: Are you talking about a slapcat-type of backup or about a file-level backup?

...

Initial testing shows `olcSpCheckpoint: 1 1` to function fine, but we wanted to be cautious about setting to this frequency in case there were any known issues doing such a frequent setting.

Thanks, Tom

Quanah Gibson-Mount

5 May 5 May

9:09 a.m.

New subject: Antw: [EXT] Re: SyncProv checkpointing

--On Wednesday, May 5, 2021 9:37 AM +0200 Ulrich Windl Ulrich.Windl@rz.uni-regensburg.de wrote:

...

...
...
...
thomaswilliampritchard@gmail.com schrieb am 04.05.2021 um 17:27 in

...

I just wonder: Are you talking about a slapcat-type of backup or about a file-level backup?

They literally stated multiple times they took the backup using the mdb_copy utility.

--Quanah

Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com

Ulrich Windl

6 May 6 May

12:01 a.m.

New subject: Antw: [EXT] Re: SyncProv checkpointing

...

...
...
Quanah Gibson-Mount quanah@symas.com schrieb am 05.05.2021 um 18:09 in

Nachricht <A5BACEECD47DBF6BD757F52D@[192.168.1.156]>:

...

‑‑On Wednesday, May 5, 2021 9:37 AM +0200 Ulrich Windl <Ulrich.Windl@rz.uni‑regensburg.de> wrote:

...
...
...
...
thomaswilliampritchard@gmail.com schrieb am 04.05.2021 um 17:27 in

...
I just wonder: Are you talking about a slapcat‑type of backup or about a file‑level backup?

They literally stated multiple times they took the backup using the mdb_copy utility.

I guess that's the equivalent of slapcat then, or are there differences? (I know that mdb_copy creates a new database file while slapcat creates LDIF entries, but both ore more clever than a (stupid) file backup).

...

‑‑Quanah

‑‑

Quanah Gibson‑Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com

Quanah Gibson-Mount

8:11 a.m.

New subject: Antw: [EXT] Re: SyncProv checkpointing

--On Thursday, May 6, 2021 10:01 AM +0200 Ulrich Windl Ulrich.Windl@rz.uni-regensburg.de wrote:

...

...
...
...
Quanah Gibson-Mount quanah@symas.com schrieb am 05.05.2021 um 18:09 in

Nachricht <A5BACEECD47DBF6BD757F52D@[192.168.1.156]>:

...
‑‑On Wednesday, May 5, 2021 9:37 AM +0200 Ulrich Windl <Ulrich.Windl@rz.uni‑regensburg.de> wrote:

...
...
...
...
thomaswilliampritchard@gmail.com schrieb am 04.05.2021 um 17:27 in

...
I just wonder: Are you talking about a slapcat‑type of backup or about a file‑level backup?

They literally stated multiple times they took the backup using the mdb_copy utility.

I guess that's the equivalent of slapcat then, or are there differences? (I know that mdb_copy creates a new database file while slapcat creates LDIF entries, but both ore more clever than a (stupid) file backup).

The mdb_copy essentially eliminates the need to do a slapadd on the other end, with the tradeoff being that it's likely going to be a larger binary file.

--Quanah

Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com

1534

Age (days ago)

1537

Last active (days ago)

openldap-technical@openldap.org

11 comments

3 participants

tags (0)

participants (3)

Quanah Gibson-Mount
thomaswilliampritchard＠gmail.com
Ulrich Windl