Hello, thank's for the answer.
Quanah Gibson-Mount wrote:
--On Thursday, November 23, 2023 5:33 PM +0000 falgon.comp(a)gmail.com wrote:
b) olcLogLevel: stats sync
- We running our tests with stats only. Meheni probably left this
configuration to check before sending the config here.
My point was more that it should be a multi-valued attribute with unique values not a single valued attribute with 2 strings in the value.
Good to know thanks
h) For your benchmark test, this is probably not frequent enough, as the purge will never run since you're saying only data - We've run endurance tests to include purging. This settings is from a month ago and we have change this settings multiples times for testing differents setup.To add the purge during tests, we actually set it to 00+01:00 00+00:03. In the final configuration we will probably set it too : 03+00:00 00+00:03. We found that purging every 3 minutes reduced the impact on performance.
While correct that frequent purging is better, you missed my overall point, which is that when you're testing you likely want to purge data on a shorter timescale when doing a benchmark.
Yes this is what we did
I repost a previous question here too : What are the exact messages or errors messages we should find in case of a collision problem?
*If* there are collisions, you'll see the server falling back to REFRESH mode. But only if you have "sync" logging enabled.
syncrepl.c: Debug( LDAP_DEBUG_SYNC, "do_syncrep2: %s delta-sync lost sync on (%s), switching to REFRESH\n", syncrepl.c: Debug( LDAP_DEBUG_SYNC, "do_syncrep2: %s delta-sync lost sync, switching to REFRESH\n",
--Quanah
We've done a lot of tests to reproduce the synchronization problem with the sync logs enables and our OpenLDAP servers don't switch to REFRESH mode. As we said, our OpenLDAP instances become very slow (as if they were freezing) during replication and they can no longer replicate correctly, resulting in a constant accumulation of delay when the problem occurs. That's really strange. We've done a lot of testing, and we have the following scenarios that we don't understand. -200,000 users in random mode -> Problem occurs -200,000 users in sequential mode -> Problem occurs -1 user in random mode -> No problem -10 users in random mode -> No problem
We still have no explanation for these results.
Again thank's for your time and your help.