Migrating HA Cluster, Want to Force Full Replication

List overview All Threads
Download

newer

older

Replication issue with OpenLdap:...

Timeouts with ldap_search_ext_s...

Brendan Kearney

17 Dec 2025 17 Dec '25

6:18 a.m.

List members,

I plan on updating my 3 node multi-primary instances and want to have a full resync done when each instance is rebuilt and rejoined to the cluster. Currently, I have both config and DIT fully replicated between all instances. When I rebuild each node, it will have all of the configs in place to be a part of the cluster. What I would like to have is all the data pushed to the newly built, but empty, instance.

I have a process that is somewhat brute force, where all data is exported, stripped of entryCSN and contextCSN values, then added back to the newly built instance. This would require that the other instances be stopped or otherwise not take any updates during the transition. I would like to avoid this disruption in service, if at all possible.

Is there a way to have a full resync done when a rebuilt instance that has no data rejoins a cluster?

Thanks in advance,

Brendan Kearney

Show replies by date

Ondřej Kuzník

17 Dec 17 Dec

6:46 a.m.

On Wed, Dec 17, 2025 at 09:18:04AM -0500, Brendan Kearney wrote:

...

List members,

I plan on updating my 3 node multi-primary instances and want to have a full resync done when each instance is rebuilt and rejoined to the cluster. Currently, I have both config and DIT fully replicated between all instances. When I rebuild each node, it will have all of the configs in place to be a part of the cluster. What I would like to have is all the data pushed to the newly built, but empty, instance.

I have a process that is somewhat brute force, where all data is exported, stripped of entryCSN and contextCSN values, then added back to the newly built instance. This would require that the other instances be stopped or otherwise not take any updates during the transition. I would like to avoid this disruption in service, if at all possible.

Is there a way to have a full resync done when a rebuilt instance that has no data rejoins a cluster?

Hi Brendan, it's much simpler than you think: - slapcat cn=config, slapadd -n0 it on each server - then either: - slapcat your DB(s) on one of the servers, then just slapadd it (no changes, no behavioural options, maybe -q) - or just start with an empty DB and the replica will sync up from scratch, when it does, you can register it with your load balancer

Certainly do not munge the slapcat output in any way before loading it. The slapcat is also your backup, some sites choose to set up a spare server that gets stopped, slapcat'd and then started again. Gives them a pristine backup as well with no impact on the rest of the site.

Just to make sure: - check your ACLs are up to scratch - give the replicating identity **full and unrestricted** read access to the relevant DB - that each server resolves its unique serverID correctly - monitor your replication (e.g. with syncmonitor[0] or a homegrown tool)

[0]. https://git.openldap.org/openldap/syncmonitor

Regards,

-- Ondřej Kuzník Senior Software Engineer Symas Corporation http://www.symas.com Packaged, certified, and supported LDAP solutions powered by OpenLDAP

Brendan Kearney

7:16 a.m.

On 12/17/25 9:46 AM, Ondřej Kuzník wrote:

...

On Wed, Dec 17, 2025 at 09:18:04AM -0500, Brendan Kearney wrote:

...
List members,

I plan on updating my 3 node multi-primary instances and want to have a full resync done when each instance is rebuilt and rejoined to the cluster. Currently, I have both config and DIT fully replicated between all instances. When I rebuild each node, it will have all of the configs in place to be a part of the cluster. What I would like to have is all the data pushed to the newly built, but empty, instance.

I have a process that is somewhat brute force, where all data is exported, stripped of entryCSN and contextCSN values, then added back to the newly built instance. This would require that the other instances be stopped or otherwise not take any updates during the transition. I would like to avoid this disruption in service, if at all possible.

Is there a way to have a full resync done when a rebuilt instance that has no data rejoins a cluster?

Hi Brendan, it's much simpler than you think:

slapcat cn=config, slapadd -n0 it on each server

then either:

slapcat your DB(s) on one of the servers, then just slapadd it (no changes, no behavioural options, maybe -q)

or just start with an empty DB and the replica will sync up from scratch, when it does, you can register it with your load balancer

Certainly do not munge the slapcat output in any way before loading it. The slapcat is also your backup, some sites choose to set up a spare server that gets stopped, slapcat'd and then started again. Gives them a pristine backup as well with no impact on the rest of the site.

Just to make sure:

check your ACLs are up to scratch - give the replicating identity **full and unrestricted** read access to the relevant DB

that each server resolves its unique serverID correctly

monitor your replication (e.g. with syncmonitor[0] or a homegrown tool)

[0]. https://git.openldap.org/openldap/syncmonitor

Regards,

Ondrej,

The "just start with an empty DB" option is what I am looking for, but in my past upgrades (years and versions ago) this did not seem to work. Only some entries wound up on the newly built server. I wound up stopping other instances and copying over the .mdb file in the DB directory and restarting.

Since I will be reusing the SID associated with each server, will attributes written by SID 3 not be copied over to the newly built server 3, that has SID 3? This seemed to be one of the nuances I saw, though I could be flat-out wrong. Maybe I was just impatient and did not wait long enough for the replication to complete.

It was one of those odd things that I saw when upgrading. Some data was replicated, and the .mdb file was vastly smaller on the newly built box and there did not seem to be traffic going between the existing cluster members and the newly built one, indicating that replication was still updating the new instance. If it was my impatience, would you know how long it takes for the replication to fully populate the blank DB? My current DB is about 2.5 GB in size.

Thank you,

Brendan Kearney

Ondřej Kuzník

8:23 a.m.

On Wed, Dec 17, 2025 at 10:16:11AM -0500, Brendan Kearney wrote:

...

The "just start with an empty DB" option is what I am looking for, but in my past upgrades (years and versions ago) this did not seem to work. Only some entries wound up on the newly built server. I wound up stopping other instances and copying over the .mdb file in the DB directory and restarting.

Are you sure the identity also had limits (size especially) set to unlimited? Or see below which is just as likely.

...

Since I will be reusing the SID associated with each server, will attributes written by SID 3 not be copied over to the newly built server 3, that has SID 3? This seemed to be one of the nuances I saw, though I could be flat-out wrong. Maybe I was just impatient and did not wait long enough for the replication to complete.

Reusing serverids is a misconfiguration, each provider **has** to have a unique non-zero serverID. The replication logic relies on it to decide where changes are coming from and where (not) to route them. This is why the serverID option has a second form of "serverID <id> <listen URL from slapd -h ...>" so that you can replicate cn=config but have every server maintain its own identity.

Everyone else apart from providers can keep their serverid at default (="0") but they can also have one assigned if you want to be able to promote them to providers easily, your choice.

...

It was one of those odd things that I saw when upgrading. Some data was replicated, and the .mdb file was vastly smaller on the newly built box and there did not seem to be traffic going between the existing cluster members and the newly built one, indicating that replication was still updating the new instance. If it was my impatience, would you know how long it takes for the replication to fully populate the blank DB? My current DB is about 2.5 GB in size.

Yes, see above. If a server claiming its serverid was 3 talked to another provider, that provider would make sure not to route any change tagged as coming from sid 3 back to it. It should have been the one to have originated it in the first place so doing otherwise is wasteful or worse.

Regarding the DB size and time to replicate: depends on your storage speed/latency. If you can write 1000 new entries/s, then I would expect a DB with 3.6M entries will take roughly an hour to finish syncing from scratch?

Also run with loglevel stats,sync and watch your logs for errors if you don't trust your cluster yet, and there's always cn=monitor too[0]. I would note that 2.7 should also get a little better at logging replication activity.

[0]. https://lists.openldap.org/hyperkitty/list/openldap-technical@openldap.org/t...

Regards,

-- Ondřej Kuzník Senior Software Engineer Symas Corporation http://www.symas.com Packaged, certified, and supported LDAP solutions powered by OpenLDAP

Brendan Kearney

8:44 a.m.

On 12/17/25 11:23 AM, Ondřej Kuzník wrote:

...

On Wed, Dec 17, 2025 at 10:16:11AM -0500, Brendan Kearney wrote:

...
The "just start with an empty DB" option is what I am looking for, but in my past upgrades (years and versions ago) this did not seem to work. Only some entries wound up on the newly built server. I wound up stopping other instances and copying over the .mdb file in the DB directory and restarting.

Are you sure the identity also had limits (size especially) set to unlimited? Or see below which is just as likely.

I don't have any explicit size limits on identities. DB size limits are "unlimited" for cn=config, 25 GB on DIT.

...

...
Since I will be reusing the SID associated with each server, will attributes written by SID 3 not be copied over to the newly built server 3, that has SID 3? This seemed to be one of the nuances I saw, though I could be flat-out wrong. Maybe I was just impatient and did not wait long enough for the replication to complete.

Reusing serverids is a misconfiguration, each provider **has** to have a unique non-zero serverID. The replication logic relies on it to decide where changes are coming from and where (not) to route them. This is why the serverID option has a second form of "serverID <id> <listen URL from slapd -h ...>" so that you can replicate cn=config but have every server maintain its own identity.

Everyone else apart from providers can keep their serverid at default (="0") but they can also have one assigned if you want to be able to promote them to providers easily, your choice.

so, the olcServerID and rid used in the replication configs should both be incremented when rolling over / upgrading a box?

...

...
It was one of those odd things that I saw when upgrading. Some data was replicated, and the .mdb file was vastly smaller on the newly built box and there did not seem to be traffic going between the existing cluster members and the newly built one, indicating that replication was still updating the new instance. If it was my impatience, would you know how long it takes for the replication to fully populate the blank DB? My current DB is about 2.5 GB in size.

Yes, see above. If a server claiming its serverid was 3 talked to another provider, that provider would make sure not to route any change tagged as coming from sid 3 back to it. It should have been the one to have originated it in the first place so doing otherwise is wasteful or worse.

Regarding the DB size and time to replicate: depends on your storage speed/latency. If you can write 1000 new entries/s, then I would expect a DB with 3.6M entries will take roughly an hour to finish syncing from scratch?

Also run with loglevel stats,sync and watch your logs for errors if you don't trust your cluster yet, and there's always cn=monitor too[0]. I would note that 2.7 should also get a little better at logging replication activity.

[0]. https://lists.openldap.org/hyperkitty/list/openldap-technical@openldap.org/t...

Regards,

Ondřej Kuzník

18 Dec 18 Dec

4:14 a.m.

On Wed, Dec 17, 2025 at 11:44:03AM -0500, Brendan Kearney wrote:

...

I don't have any explicit size limits on identities. DB size limits are "unlimited" for cn=config, 25 GB on DIT.

It's not about DB size (although yes, worth monitoring olmMDBPagesUsed etc.) but about search size limits which AFAIK tend to default to 500 for non-root users unless changed by olcLimits.

...

...
Reusing serverids is a misconfiguration, each provider **has** to have a unique non-zero serverID. The replication logic relies on it to decide where changes are coming from and where (not) to route them. This is why the serverID option has a second form of "serverID <id> <listen URL from slapd -h ...>" so that you can replicate cn=config but have every server maintain its own identity.

Everyone else apart from providers can keep their serverid at default (="0") but they can also have one assigned if you want to be able to promote them to providers easily, your choice.

so, the olcServerID and rid used in the replication configs should both be incremented when rolling over / upgrading a box?

Upgrading in-place is fine, because there's never two servers with the same sid. But when adding a new provider, add another olcServerID: value to cn=config with a unique serverID and its URI.

Regards,

-- Ondřej Kuzník Senior Software Engineer Symas Corporation http://www.symas.com Packaged, certified, and supported LDAP solutions powered by OpenLDAP

Brendan Kearney

5 a.m.

On 12/18/25 7:14 AM, Ondřej Kuzník wrote:

...

On Wed, Dec 17, 2025 at 11:44:03AM -0500, Brendan Kearney wrote:

...
I don't have any explicit size limits on identities. DB size limits are "unlimited" for cn=config, 25 GB on DIT.

It's not about DB size (although yes, worth monitoring olmMDBPagesUsed etc.) but about search size limits which AFAIK tend to default to 500 for non-root users unless changed by olcLimits.

The root DN is currently used as the bind DN for replication, so search size would/should not affect replication. otherwise, I don't have olcLimits set.

...

...
...
Reusing serverids is a misconfiguration, each provider **has** to have a unique non-zero serverID. The replication logic relies on it to decide where changes are coming from and where (not) to route them. This is why the serverID option has a second form of "serverID <id> <listen URL from slapd -h ...>" so that you can replicate cn=config but have every server maintain its own identity.

Everyone else apart from providers can keep their serverid at default (="0") but they can also have one assigned if you want to be able to promote them to providers easily, your choice.

so, the olcServerID and rid used in the replication configs should both be incremented when rolling over / upgrading a box?

Upgrading in-place is fine, because there's never two servers with the same sid. But when adding a new provider, add another olcServerID: value to cn=config with a unique serverID and its URI.

I am seeking a bit of clarification here. I am upgrading in place, and no servers have overlapping SID, but I cannot reuse a SID. The rebuild will reuse IPs as well. The newly built server will retain just about every configuration that was set in the previously installed OS. So, should I increment the below:

olcServerID: 1 ldap://ldap1.bpk2.com olcServerID: 2 ldap://ldap2.bpk2.com olcServerID: 3 ldap://ldap3.bpk2.com

to be:

olcServerID: 1 ldap://ldap1.bpk2.com olcServerID: 2 ldap://ldap2.bpk2.com olcServerID: 4 ldap://ldap3.bpk2.com

when I rebuild the host known as ldap3?

Thanks for the insight,

Brendan Kearney

...

Regards,

Ondřej Kuzník

7:10 a.m.

On Thu, Dec 18, 2025 at 08:00:09AM -0500, Brendan Kearney wrote:

...

...
It's not about DB size (although yes, worth monitoring olmMDBPagesUsed etc.) but about search size limits which AFAIK tend to default to 500 for non-root users unless changed by olcLimits.

The root DN is currently used as the bind DN for replication, so search size would/should not affect replication. otherwise, I don't have olcLimits set.

In that case you should gather some sync level logs (at least) from the new server to see what's actually happening.

...

...
...
...
Reusing serverids is a misconfiguration, each provider **has** to have a unique non-zero serverID. The replication logic relies on it to decide where changes are coming from and where (not) to route them. This is why the serverID option has a second form of "serverID <id> <listen URL from slapd -h ...>" so that you can replicate cn=config but have every server maintain its own identity.

Everyone else apart from providers can keep their serverid at default (="0") but they can also have one assigned if you want to be able to promote them to providers easily, your choice.

so, the olcServerID and rid used in the replication configs should both be incremented when rolling over / upgrading a box?

Upgrading in-place is fine, because there's never two servers with the same sid. But when adding a new provider, add another olcServerID: value to cn=config with a unique serverID and its URI.

I am seeking a bit of clarification here. I am upgrading in place, and no servers have overlapping SID, but I cannot reuse a SID. The rebuild will reuse IPs as well. The newly built server will retain just about every configuration that was set in the previously installed OS. So, should I increment the below:

olcServerID: 1 ldap://ldap1.bpk2.com olcServerID: 2 ldap://ldap2.bpk2.com olcServerID: 3 ldap://ldap3.bpk2.com

to be:

olcServerID: 1 ldap://ldap1.bpk2.com olcServerID: 2 ldap://ldap2.bpk2.com olcServerID: 4 ldap://ldap3.bpk2.com

when I rebuild the host known as ldap3?

I understood your previous comment to mean that several (all?) running servers shared serverid 3. You are fine to keep the new server as 3 if it's cleanly replacing (never running at the same time as) the old one or use 4, that should not be the issue.

-- Ondřej Kuzník Senior Software Engineer Symas Corporation http://www.symas.com Packaged, certified, and supported LDAP solutions powered by OpenLDAP

Windl, Ulrich

11:24 p.m.

New subject: [EXT] Migrating HA Cluster, Want to Force Full Replication

If you want a full sync, I'd export and import, then "delta sync". Reason: The change messages will be quite a lot and the performance is non-optimal. Also the server may answer while it's not up to date. Why do you want a full sync?

Kind regards, Ulrich Windl

...

-----Original Message----- From: Brendan Kearney bpk678@gmail.com Sent: Wednesday, December 17, 2025 3:18 PM To: openldap-technical@openldap.org Subject: [EXT] Migrating HA Cluster, Want to Force Full Replication

Sicherheits-Hinweis: Diese E-Mail wurde von einer Person außerhalb des UKR gesendet. Seien Sie vorsichtig vor gefälschten Absendern, wenn Sie auf Links klicken, Anhänge öffnen oder weitere Aktionen ausführen, bevor Sie die Echtheit überprüft haben.

List members,

I plan on updating my 3 node multi-primary instances and want to have a full resync done when each instance is rebuilt and rejoined to the cluster. Currently, I have both config and DIT fully replicated between all instances. When I rebuild each node, it will have all of the configs in place to be a part of the cluster. What I would like to have is all the data pushed to the newly built, but empty, instance.

I have a process that is somewhat brute force, where all data is exported, stripped of entryCSN and contextCSN values, then added back to the newly built instance. This would require that the other instances be stopped or otherwise not take any updates during the transition. I would like to avoid this disruption in service, if at all possible.

Is there a way to have a full resync done when a rebuilt instance that has no data rejoins a cluster?

Thanks in advance,

Brendan Kearney

Brendan Kearney

19 Dec 19 Dec

4:57 a.m.

New subject: [EXT] Migrating HA Cluster, Want to Force Full Replication

On 12/19/25 2:24 AM, Windl, Ulrich wrote:

...

If you want a full sync, I'd export and import, then "delta sync". Reason: The change messages will be quite a lot and the performance is non-optimal. Also the server may answer while it's not up to date. Why do you want a full sync?

Kind regards, Ulrich Windl

...
-----Original Message----- From: Brendan Kearney bpk678@gmail.com Sent: Wednesday, December 17, 2025 3:18 PM To: openldap-technical@openldap.org Subject: [EXT] Migrating HA Cluster, Want to Force Full Replication

Sicherheits-Hinweis: Diese E-Mail wurde von einer Person außerhalb des UKR gesendet. Seien Sie vorsichtig vor gefälschten Absendern, wenn Sie auf Links klicken, Anhänge öffnen oder weitere Aktionen ausführen, bevor Sie die Echtheit überprüft haben.

List members,

I plan on updating my 3 node multi-primary instances and want to have a full resync done when each instance is rebuilt and rejoined to the cluster. Currently, I have both config and DIT fully replicated between all instances. When I rebuild each node, it will have all of the configs in place to be a part of the cluster. What I would like to have is all the data pushed to the newly built, but empty, instance.

I have a process that is somewhat brute force, where all data is exported, stripped of entryCSN and contextCSN values, then added back to the newly built instance. This would require that the other instances be stopped or otherwise not take any updates during the transition. I would like to avoid this disruption in service, if at all possible.

Is there a way to have a full resync done when a rebuilt instance that has no data rejoins a cluster?

Thanks in advance,

Brendan Kearney

Ulrich

I don't have concerns about performance or answering/responding, as the cluster is load balanced. I will not have the newly built instances available in the load balanced pool while the data is being updated on it.

It's not that I want a full sync, but that i have to have one. The dataset will be blank upon rebuild and the newly built instance will have no data to provide when queried.

Thank you,

Brendan Kearney

Quanah Gibson-Mount

6 Jan 6 Jan

8:44 a.m.

New subject: [EXT] Migrating HA Cluster, Want to Force Full Replication

--On Friday, December 19, 2025 7:57 AM -0500 Brendan Kearney bpk678@gmail.com wrote:

...

On 12/19/25 2:24 AM, Windl, Ulrich wrote: Ulrich

I don't have concerns about performance or answering/responding, as the cluster is load balanced. I will not have the newly built instances available in the load balanced pool while the data is being updated on it.

It's not that I want a full sync, but that i have to have one. The dataset will be blank upon rebuild and the newly built instance will have no data to provide when queried.

I'd suggest making it part of the spinup process to import an ldif. In my environment, I have a system specifically for backups that hourly stops slapd, exports via slapcat, and stores the ldif in a location accessible to all servers. On new server spinup, the most recent backup can be imported and then the system started, so replication has only a small window it needs to sync.

--Quanah

Age (days ago)

Last active (days ago)

openldap-technical@openldap.org

10 comments

4 participants

tags (0)

participants (4)

Brendan Kearney
Ondřej Kuzník
Quanah Gibson-Mount
Windl, Ulrich